首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
1.引言计算技术的飞速发展为气动问题的数值模拟提供了良好的工具,虽然近年来数值方法的研究取得了很大进展,但如何提高计算效率仍是一个重要课题。显式方法计算简单,但时间步长受到很大限制,Beam和Warming提出的隐式方法提高了计算效率。一般说来,隐式法对二维问题需对五对角块矩阵进行求逆,而对三维问题需对七对角块矩阵求逆,目前还没有一个好的方法直接求解这类问题,近似因式分解法使求解过程大为简化,  相似文献   

2.
解循环三对角线性方程组的追赶法   总被引:9,自引:0,他引:9  
循环三对角、循环 Toeplitz三对角线性方程组的求解在科学与工程计算中有着广泛的应用 .运用矩阵分解给出此类方程组的直接解法 ;通过分析其特性 ,给出了达到机器精度的截断算法 ,其计算复杂度几乎等同于求解一个三对角线性方程组的计算复杂度 .数值实验的结果与理论分析的结果十分吻合 .该算法还推广到求解拟三对角线性方程组 .  相似文献   

3.
应用渐近波形估计技术计算目标宽带雷达散射截面(RCS),可有效提高计算效率。然而当目标为电大尺寸时,阻抗矩阵求逆运算将十分耗时,甚至无法计算。提出使用Krylov子空间迭代法取代矩阵逆来求解大型矩阵方程,应用双门槛不完全LU分解预处理技术降低迭代求解所需的迭代次数。数值计算表明,该方法结果与矩量法逐点求解结果吻合良好,并且计算效率大大提高。  相似文献   

4.
由邻接矩阵求解可达矩阵的一种改进简便算法   总被引:1,自引:0,他引:1  
传统的由邻接矩阵求解可达矩阵的算法计算量很大,不适合手动计算,也没有提出相应的适合计算机的算法。这篇文章引入转移矩阵的概念.并在此基础上加以改进,形成一套完整的可行的求解可迭矩阵的方法。有效地减少了计算量。  相似文献   

5.
大规模三角线性方程求解是科学与工程应用中重要的计算核心,受限于处理器的缓存容量和结构设计,其在CPU和GPU等平台上的计算效率不高。大规模三角线性方程的分块求解中,矩阵乘是主要运算,其计算效率对提升三角线性方程求解的计算效率至关重要。以矩阵乘计算效率较高的矩阵乘协处理器为计算平台,针对其结构特点提出了矩阵乘协处理器上大规模三角线性方程分块求解的实现方法和性能分析模型。实验结果表明,矩阵乘协处理器上大规模三角线性方程求解的计算效率最高可达85.9%,其实际性能和资源利用率分别为同等工艺下GPU的2.42倍和10.72倍。  相似文献   

6.
针对密度泛函微扰理论中响应密度矩阵的计算问题,提出了一种全新的Sternheimer方程的并行求解方法,即通过共轭梯度算法和矩阵直接分解算法对Sternheimer方程进行求解,并且在第一性原理的分子模拟软件FHI-aims中实现了这两种算法。实验结果表明采用共轭梯度算法和矩阵直接分解算法的计算结果精度较高,相比传统方法的计算结果误差较小,且具有可扩展性,验证了新的Sternheimer方程中线性方程求解的正确性和有效性。  相似文献   

7.
计算对称矩阵中的某些特定的特征值和特征向量问题是很多科学计算领域中都存在的重要课题。特别在电子结构的计算中,特征值计算成为计算瓶颈。以往在需要求解大部分特征值和特征向量的应用场合,一般使用直接求解的方式。为了更好地利用存储器性能优势,我们设计了对角化算法,对规约与逆变换过程进行拆分处理,通过对整个过程的重新设计,充分利用存储器结构上的优势,提升单核计算速度,同时改进并行效率。本文中我们重点讨论三对角矩阵到带状矩阵逆变换过程。本文中所提及到的算法应用于MESIA电子结构计算软件包之中,取得了一定的性能提升。  相似文献   

8.
许多科学、工程计算问题都归结为大型线性方程组的求解.共轭斜量法与逐次超松弛方法是最常用的迭代法,它们或直接用于线性方程组的求解,或用于对直接法求出的近似解进行磨光.在上述两种迭代方法中,系数矩阵与列向量的乘积占很大计算量.因此,减少寻找运算数据所占用的时间,特别是对于大型稀疏方程组,系数矩阵分块存在外存贮器的情况下,减少寻址和数据I/O次数,对提高运行效率是举足轻重的.本文给出的是适用于两种常见数据结构的CG算法与SOR算法.它们几乎节省一半的寻址时间和更多的I/O时间,特别是在有大量I/O的情况下。  相似文献   

9.
多回路控制性能评价中单位关联矩阵的求解很重要,会影响到时延项的确定与控制性能指标的计算,因此探讨关联矩阵的计算问题是个有意义的话题。关联矩阵求解一般需要知道一些过程模型的信息,这个条件对于实际应用比较苛刻。为避开先验知识的约束,提出了基于闭环数据的单位关联矩阵求解方法。通过对辨识所得各类传函模型的预处理,能从多项式矩阵中直接求解出对角单位关联矩阵或迭代求解出非对角单位关联矩阵。应用结果证明了该方法的有效性。  相似文献   

10.
在工程实际中,许多问题都可以归结为数值法求解偏微分方程(组)的问题.偏微分方程数值解法主要包括有限差分法、有限元法和有限体积法,其中大多数方法都是通过离散的方式将方程转化为线性方程组,通过求解线性系统得到原方程的数值解.在这个过程中,线性方程组的系数矩阵通常很大并且很稀疏,会占用大量存储空间并使方程组难以求解.针对这个问题,本文研究大型稀疏矩阵的压缩存储方法,只存储非零元素,降低存储空间消耗,避免零元素参与计算,提升计算效率.具体来说,在稀疏矩阵生成过程中,使用十字链表法存储,可以在常数时间内完成非零元素的插入操作;在方程组求解过程中,使用按行(列)压缩存储方法,既节约存储空间,又可以提高求解器的求解效率.在实验部分,本文分别使用有限差分法求解Laplace方程和有限元法计算圆环截面应力分布问题,对其中大型稀疏线性方程组的系数矩阵,采用十字链表法和按行(列)压缩存储法存储,使用直接法和迭代法求解线性方程组.实验结果显示,对于结构化和非结构化的稀疏矩阵,压缩存储方法不仅能够大幅度减少内存空间的占用,而且能够显著提升求解器的效率.  相似文献   

11.
Preconditioning techniques based on incomplete Cholesky factorization are very efficient in increasing the convergence rates of basic iterative methods. Complicated addressings and high demands for auxiliary storage, or increased factorization time, have reduced their appeal as general purpose preconditioners. In this study an elegant computational implementation is presented which succeeds in reducing both computing storage and factorization time. The proposed implementation is applied to two incomplete factorization schemes. The first is based on the rejection of certain terms according to their magnitude, while the second is based on a rejection criterion relative to the position of the zero terms of the coefficient matrix. Numerical results demonstrate the superiority of the proposed preconditioners over other types of preconditioning matrices, particularly for ill-conditioned problems. They also show their efficiency for large-scale problems in terms of computer storage and CPU time, over a direct solution method using the skyline storage scheme.  相似文献   

12.
Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speed up the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current multi-core and many-core SIMD architectures (SSE, AVX2, KNC, AVX512, Neon, Altivec). We focus on the fact that, on some architectures, compilers are unable to vectorize and on other architectures, vectorizing compilers are not efficient. Thus hand-made SIMDization is mandatory. We achieve with these transformations combined with SIMD a speedup from × 14 to × 28 for the whole resolution in single precision compared to the naive code on a AVX2 machine and a speedup from × 6 to × 14 on double precision, both with a strong scalability.  相似文献   

13.
大型稀疏线性方程组新的ICCG方法   总被引:2,自引:0,他引:2  
有限元线性方程组的系数矩阵一般具有稀疏性和对称性的特点,全稀疏存贮方法就是利用这些特点,只存贮对称部分的非零元素,采用链表式管理,即节省存贮空间,又便于动态更改.在完全Cholesky分解的基础上,构造出了新的预处理方法,应用适当的对角元修正策略,得到了一种新的ICCG方法,能够确保方程组高效准确的分解和求解.数值算例证明该算法在时间和存贮上都较为占优,可靠高效,能够应用于有限元线性方程组的求解.  相似文献   

14.
The increase of computer performance continues to support the practice of large-scale optimization. Computers with multiple computing cores and vector processing capabilities are now widely available. We investigate how the recently introduced Advanced Vector Instruction (AVX) set on Intel-compatible architectures can be exploited in interior point methods for linear and nonlinear optimization. We focus on data structures and implementation techniques that utilize the new vector instructions. Our numerical experiments demonstrate that the AVX instruction set provides a significant performance boost in our implementation on large-scale problem that have significant fill-in in the sparse Cholesky factorization, achieving up to 100 gigaflops performance on a standard desktop computer on linear optimization problems for which the required Cholesky factorization is relatively dense.  相似文献   

15.
This paper is concerned with principal considerations for developing a linear algebra package for the SUPRENUM computer. The design goals, as well as the mapping strategy of the parallelization methodology, are described briefly. Finally, a basic factorization scheme is introduced which can be readily tailored to the LU, Cholesky and QR factorization provided that the corresponding matrices are distributed according to the column-oriented wrap mapping.  相似文献   

16.
核矩阵计算是求解支持向量机的关键,已有精确计算方法难以处理大规模的样本数据.为此,研究核矩阵的近似计算方法.首先,借助支持向量机的凸二次约束线性规划表示,给出支持向量机和多核支持向量机的二阶锥规划表示.然后,综合Monte Carlo方法和不完全Cholesky分解方法,提出一个新的核矩阵近似算法KMA-α,该算法首先对核矩阵进行Monte Carlo随机采样,采样后不直接进行奇异值分解,而是应用具有对称置换的不完全Cholesky分解来计算接近最优的低秩近似.以KMA-α输出的近似核矩阵作为支持向量机的输入,可提高支持向量机二阶锥规划求解的效率.进一步,分析了KMA-α的算法复杂性,证明了KMA-α的近似误差界定理.最后,通过标准数据集上的实验,验证了KMA-α的合理性和计算效率.理论分析与实验结果表明,KMA-α是一合理、有效的核矩阵近似算法.  相似文献   

17.
细观数值模拟是混凝土性能研究的一种重要手段,但稀疏线性方程组求解在总体模拟时间中所占比重很大。由于属于三维问题,且规模很大,所以采用预条件Krylov子空间迭代是必由之路。Aztec是国际上专门设计用于求解稀疏线性方程组的软件包之一,由于目前混凝土细观数值模拟中的稀疏线性方程组对称正定,所以利用Aztec中提供的CG迭代法进行求解,并对多种能保持对称性的预条件选项进行了实验比较。结果表明,在基于区域分解的并行不完全Cholesky分解、无重叠对称化GS迭代、最小二乘等预条件技术中,第一种的效率最高,且在重叠度为0,填充层次为0时,效果最好;实验结果还表明,在本应用问题中,用RCM排序一般导致求解时间更长,从而没有必要采用。  相似文献   

18.
In this paper,some parallel algorithms are described for solving numerical linear algebra problems on Dawning-1000.They include matrix multiplication,LU factorization of a dense matrix,Cholesky factorization of a symmetric matrix,and eigendecomposition of symmetric matrix for real and complex data types.These programs are constructed based on fast BLAS library of Dawning-1000 under NX environment.Some comparison results under different parallel environments and implementing methods are also given for Cholesky factorization.The execution time,measured performance and speedup for each problem on Dawning-1000 are shown.For matrix multiplication and IU factorization,1.86GFLOPS and 1.53GFLOPS are reached.  相似文献   

19.
The present paper is dedicated to the preconditioning of boundary element matrices which are given in wavelet coordinates. We investigate the incomplete Cholesky factorization (ICF) for a pattern which includes also the coefficients of all off-diagonal bands associated with the level–level-interactions. The pattern is chosen in such a way that the ICF is computable in log-linear complexity. Numerical experiments are performed to quantify the effects of the proposed preconditioning.  相似文献   

20.
We present a fast method for simulating, animating, and rendering lightning using adaptive grids. The "dielectric breakdown model" is an elegant algorithm for electrical pattern formation that we extend to enable animation of lightning. The simulation can be slow, particularly in 3D, because it involves solving a large Poisson problem. Losasso et al. recently proposed an octree data structure for simulating water and smoke, and we show that this discretization can be applied to the problem of lightning simulation as well. However, implementing the incomplete Cholesky conjugate gradient (ICCG) solver for this problem can be daunting, so we provide an extensive discussion of implementation issues. ICCG solvers can usually be accelerated using "Eisenstat's trick," but the trick cannot be directly applied to the adaptive case. Fortunately, we show that an "almost incomplete Cholesky" factorization can be computed so that Eisenstat's trick can still be used. We then present a fast rendering method based on convolution that is competitive with Monte Carlo ray tracing but orders of magnitude faster, and we also show how to further improve the visual results using jittering  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号