期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

赖智超罗晓群张其林《计算机辅助工程》2014,23(2):46-52

采用列压缩稀疏(Compressed Sparse Column,CSC)矩阵存储策略对矩阵LDL分解前进行填充元优化排序;基于消去树进行LDL符号分解,使之独立于数值分解,避免多余的内存消耗,减少不必要的数值运算.利用矩阵非零元的分布特性分析并实现超节点LDL分解算法,将稀疏矩阵的分解运算变为一系列稠密矩阵运算,并使用优化的BLAS函数库加速分解.测试表明:算法在成倍地提高计算速度的同时进一步降低内存消耗,适用于大规模的结构计算. 相似文献

2.

基于前向后向算子分裂的稀疏性正则化图像超分辨率算法

孙玉宝费选韦志辉肖亮《自动化学报》2010,36(9):1232-1238

提出了一种新的基于稀疏表示正则化的多帧图像超分辨凸变分模型, 模型中的正则项刻画了理想图像在框架系统下的稀疏性先验, 保真项度量其在退化模型下与观测信号的一致性, 同时分析了最优解条件. 进一步, 基于前向后向算子分裂法提出了求解该模型的不动点迭代数值算法, 每一次迭代分解为仅对保真项的前向(显式)步与仅对正则项的后向(隐式)步, 从而大幅度降低了计算复杂性; 分析了算法的收敛性, 并采取序贯策略提高收敛速度. 针对可见光与红外图像序列进行了数值仿真, 实验结果验证了本文模型与数值算法的有效性. 相似文献

3.

GPU加速不完全Cholesky分解预条件共轭梯度法

陈尧赵永华赵慰赵莲《计算机研究与发展》2015,(4):843-850

不完全 Cholesky 分解预条件共轭梯度（incomplete Cholesky factorization preconditioned conjugate gradient ,ICCG）法是求解大规模稀疏对称正定线性方程组的有效方法。然而ICCG法要求在每次迭代中求解2个稀疏三角方程组,稀疏三角方程组求解固有的串行性成为了ICCG法在GPU上并行求解的瓶颈。针对稀疏三角方程组求解,给出了一种利用GPU 加速的有效方法。为了增加稀疏三角方程组求解在GPU上的多线程并行性,提出了对不完全Cholesky分解产生的稀疏三角矩阵进行分层调度（level scheduling ）的方法。为了进一步提高稀疏三角方程组求解的并行性能,提出了在分层调度前通过近似最小度（approximate minimum degree ,AMD）算法对系数矩阵进行重排序、在分层调度后对稀疏三角矩阵进行层排序的方法,降低了分层调度过程中产生的层数,优化了稀疏三角方程组求解的GPU内存访问模式。数值实验表明,与利用NVIDIA CUSPARSE实现的ICCG法相比,采用上述方法性能可以获得平均1倍以上的提升。相似文献

4.

基于多尺度稀疏表示的图像融合方法

首照宇胡蓉欧阳宁张彤《计算机工程与设计》2015,36(1)

针对目前基于稀疏表示的常用图像融合算法计算复杂度高以及忽略图像局部特征的问题,提出多尺度稀疏表示(multi-scale sparse representation,MSR)的图像融合方法.充分利用小波多尺度分析较好突出图像局部特征的特点,将其和过完备稀疏表示有效结合;待融合图像在小波解析域中进行小波多层分解,对每个尺度的特征运用K-SVD (kernel singular value decomposition)多尺度字典进行OMP (orthogonal matching pursuit)稀疏编码,并在小波域中各个尺度中进行融合.实验结果表明,与传统的小波变换、轮廓波变换、稀疏表示融合算法相比,该算法更能保证图像局部特征的完整性,实现更好的性能. 相似文献

5.

基于曲线波和稀疏表达的卡通—纹理模型

康晓东王昊郭宏郭军《计算机应用》2012,32(10):2786-2789

CT图像去噪恢复是医学影像图像处理的基础环节。为解决卡通—纹理模型在医学图像去噪应用中计算困难和精度低的问题,对卡通—纹理模型分解方法进行了扩展。首先,以曲线波变换描述图像卡通—纹理模型中的结构部分;其次,以更稀疏的对偶树复小波变换描述图像卡通—纹理模型中的纹理部分;最后,建立了结合曲线波和稀疏表达的图像卡通—纹理分解模型,并讨论了模型的分解算法。仿真实验结果表明,新方法可有效地解决医学影像图像去噪算法中迭代计算量大的问题,并可提高处理后图像的质量。相似文献

6.

多尺度脊波字典的构造及其在图像编码中的应用

下载免费PDF全文

邓承志曹汉强《中国图象图形学报》2009,14(7):1273-1278

在变换域图像编码技术中,图像的稀疏表示是编码的关键。该文在分析了脊波函数的缺陷的基础上,首先提出了一种多尺度脊波字典的构造方法,并通过树形结构对原子进行组织,加快了图像稀疏分解中最匹配原子的搜索速度;然后提出了一种基于树形多尺度脊波字典的匹配追踪静态图像编码方法;最后通过对量化失真与编码速率的分析,结合稀疏分解系数的分布,提出了系数的自适应量化和编码方案。实验结果表明,多尺度脊波字典能够对图像进行有效的稀疏表示;与JPEG2000相比,新的编码算法具有更好的编码性能,尤其在低比特率条件下。相似文献

7.

一种宽带信号DOA估计新方法

王峰王建英《微计算机信息》2007,23(30):128-130

该文研究了信号稀疏分解在阵列信号处理中的应用，将信号非正交分解应用到阵列信号处理领域，突破了信号正交分解的思想．通过计算传感器阵列输出信号的稀疏分解，实现了信号空间谱的超分辨估计，提出了一种全新的宽带信号源波达方向（DOA：Direction of Arrival）估计算法。在较低信噪比情况下，该新算法的性能优于传统的波达方向估计算法，计算机仿真结果验证了算法的有效性。相似文献

8.

新预处理ILUCG法求解稀疏病态线性方程组 总被引：3，自引：0，他引：3

于春肖苑润浩穆运峰《数值计算与计算机应用》2014,(1):21-27

大型稀疏病态线性方程组的高效求解在科学计算和工程应用中起着十分重要的作用.对于一般非对称正定的非奇异线性代数方程组,首先介绍常用的不完全LU分解预处理矩阵构造技术;然后给出SSOR预处理分解及其改进分解,并基于ILUCG思想提出新预处理ILUCG法同时给出收敛性分析;最后进行数值模拟仿真试验,数值结果表明该算法是有效可行的,且较之一般的预处理ILUCG方法该法在求解稀疏病态方程组方面具有优越性. 相似文献

9.

四阶高分辨率熵相容算法

郑素佩封建湖《计算机应用》2013,33(9):2416-2418

针对一维Burgers方程和一维Euler方程组的数值求解问题,提出了一种四阶高分辨率熵相容算法。新算法时间方向采用半离散方式,空间方向应用四阶中心加权基本无振荡(CWENO)重构方法,数值通量引入Ismail通量函数,将新的四阶算法应用于静态激波问题、激波管问题以及强稀疏波问题的数值求解中,并将所得结果同准确解以及已有算法所得结果进行了分析与比较。数值结果表明:新算法计算结果正确、分辨率高,能够准确捕捉激波及稀疏波,并能有效避免膨胀激波的产生。新算法适用于准确解决一维Burgers方程和一维Euler方程组的数值求解问题。相似文献

10.

利用FFT实现语音信号稀疏分解的改进算法

下载免费PDF全文

刘强尹忠科王建英《计算机工程与应用》2007,43(26):74-75

研究基于Matching Pursuit（MP）方法实现的语音信号稀疏分解问题,通过对语音信号稀疏分解中使用的过完备原子库结构特性的分析,提出了一种改进的信号稀疏分解算法。该算法针对语音信号的特点,以FFT算法实现的稀疏分解为基础缩小了原子的搜索范围,从而不仅进一步提高分解速度,还能以更稀疏的形式表示语音信号。算法的有效性为实验结果所证实。相似文献

11.

A CPU-GPU hybrid approach for the unsymmetric multifrontal method 总被引：1，自引：0，他引：1

Chenhan D. YuWeichung Wang Dan’l Pierce 《Parallel Computing》2011,37(12):759-770

Multifrontal is an efficient direct method for solving large-scale sparse and unsymmetric linear systems. The method transforms a large sparse matrix factorization process into a sequence of factorizations involving smaller dense frontal matrices. Some of these dense operations can be accelerated by using a graphic processing unit (GPU). We analyze the unsymmetric multifrontal method from both an algorithmic and implementational perspective to see how a GPU, in particular the NVIDIA Tesla C2070, can be used to accelerate the computations. Our main accelerating strategies include (i) performing BLAS on both CPU and GPU, (ii) improving the communication efficiency between the CPU and GPU by using page-locked memory, zero-copy memory, and asynchronous memory copy, and (iii) a modified algorithm that reuses the memory between different GPU tasks and sets thresholds to determine whether certain tasks be performed on the GPU. The proposed acceleration strategies are implemented by modifying UMFPACK, which is an unsymmetric multifrontal linear system solver. Numerical results show that the CPU-GPU hybrid approach can accelerate the unsymmetric multifrontal solver, especially for computationally expensive problems. 相似文献

12.

Logarithmic barriers for sparse matrix cones

Martin S. Andersen Joachim Dahl Lieven Vandenberghe 《Optimization methods & software》2013,28(3):396-423

Algorithms are presented for evaluating gradients and Hessians of logarithmic barrier functions for two types of convex cones: the cone of positive semidefinite matrices with a given sparsity pattern and its dual cone, the cone of sparse matrices with the same pattern that have a positive semidefinite completion. Efficient large-scale algorithms for evaluating these barriers and their derivatives are important in interior-point methods for nonsymmetric conic formulations of sparse semidefinite programs. The algorithms are based on the multifrontal method for sparse Cholesky factorization. 相似文献

13.

Complex flow simulations in natural aquifer

Hussein Mustapha 《Advanced Engineering Informatics》2009,23(3):288-293

Natural aquifers are complex media and contain heterogeneous structures. This paper introduces a new algorithm to simulate flow fluid in such complex media. A parallel version of the method is released, and two well-known sparse linear solvers, based, respectively, on a multifrontal Cholesky factorization and an iterative structured multigrid method, are tested. The mixed finite element (MFE) method is used to discretize Darcy’s equation. The efficiency of the algorithm proposed is shown in different numerical examples. 相似文献

14.

一种LU分解与迭代法的结合策略及算法实现 总被引：3，自引：1，他引：3

李滨郑赟叶以正肖立伊黄国勇《计算机工程与设计》2002,23(3):16-21

在矩阵求解算法中，直接法或迭代法都不能有效地求解大规模稀疏或病态矩阵,因此提出一种LU分解与迭代法结合的策略。采用LU分解对矩阵进行预处理，以提高迭代法的收敛性,并采用一种判断策略使矩阵的LU分解结果可最大限度地重复利用。此结合策略应用于两种共轭梯度（CG）法，得到CLUCG和CLUTCG两种算法。它们已应用于模拟和混合信号电路模拟器ZeniVDE中。大量实验结果表明此结合策略是很有效的，得到的两种算法具有较快的速度和较好的收敛性。相似文献

15.

Corrigendum to “Complex flow simulation in natural aquifer” [Adv. Eng. Inform. 23 (2009) 288–293]: An algorithm for parallel flow simulations in the finite element framework

H. Mustapha A. Ghorayeb K. Mustapha 《Advanced Engineering Informatics》2013,27(1):149-156

The high advanced techniques in parallel computing can be employed for a better understanding of groundwater flow fluids. Generally, the geological media are very heterogeneous and contain complex structures. Decomposing these structures into, approximately, equivalent sub-structures for a load-balancing is a major challenge. This paper proposes and analyses a new algorithm to simulate parallel flow fluid in such complex media. Fully parallel software is developed, and two well-known sparse linear solvers, based respectively on a multifrontal Cholesky factorization and an iterative structured multigrid method, are compared. The mixed finite element (MFE) method is used to discretize Darcy’s equation. Numerical examples are presented to show the efficiency and robustness of the algorithm proposed. 相似文献

16.

Highly scalable parallel algorithms for sparse matrix factorization

Gupta A. Karypis G. Kumar V. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):502-520

In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer 相似文献

17.

Accelerating implicit integration in multi-body dynamics using GPU computing

Jihyun Jung Daesung Bae 《Multibody System Dynamics》2018,42(2):169-195

A new direct linear equation solver is proposed for GPUs. The proposed solver is applied to mechanical system analysis. In contrast to the DFS post-order traversal which is widely used for conventional implementation of supernodal and multifrontal methods, the BFS reverse-level order traversal has been adopted to obtain more parallelism and a more adaptive control of data size. The proposed implementation allows solving large problems efficiently on many kinds of GPUs. Separators are divided into smaller blocks to further improve the parallel efficiency. Numerical experiments show that the proposed method takes smaller factorization time than CHOLMOD in general and has better operational availability than SPQR. Mechanical dynamic analysis has been carried out to show the efficiency of the proposed method. The computing time, memory usage, and solution accuracy are compared with those obtained from DSS included in MKL. The GPU has been accelerated about 2.5–5.9 times during the numerical factorization step and approximately 1.9–4.7 times over the whole analysis process, compared to an experimental CPU device. 相似文献

18.

Exploiting hardware capabilities in interior point methods

Csaba Mészáros 《Optimization methods & software》2016,31(2):435-443

The increase of computer performance continues to support the practice of large-scale optimization. Computers with multiple computing cores and vector processing capabilities are now widely available. We investigate how the recently introduced Advanced Vector Instruction (AVX) set on Intel-compatible architectures can be exploited in interior point methods for linear and nonlinear optimization. We focus on data structures and implementation techniques that utilize the new vector instructions. Our numerical experiments demonstrate that the AVX instruction set provides a significant performance boost in our implementation on large-scale problem that have significant fill-in in the sparse Cholesky factorization, achieving up to 100 gigaflops performance on a standard desktop computer on linear optimization problems for which the required Cholesky factorization is relatively dense. 相似文献

19.

GPU-accelerated preconditioned iterative linear solvers 总被引：1，自引：1，他引：0

Ruipeng Li Yousef Saad 《The Journal of supercomputing》2013,63(2):443-466

This work is an overview of our preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed. Our experiments with an NVIDIA TESLA M2070 show that for unstructured matrices SpMV kernels can be up to 8 times faster on the GPU than the Intel MKL on the host Intel Xeon X5675 Processor. Overall performance of the GPU-accelerated Incomplete Cholesky (IC) factorization preconditioned CG method can outperform its CPU counterpart by a smaller factor, up to 3, and GPU-accelerated The incomplete LU (ILU) factorization preconditioned GMRES method can achieve a speed-up nearing 4. However, with better suited preconditioning techniques for GPUs, this performance can be further improved. 相似文献

20.

Blind estimation of channel parameters and source components for EEG signals: a sparse factorization approach

Yuanqing Li Cichocki A. Amari S.-I. 《Neural Networks, IEEE Transactions on》2006,17(2):419-431

In this paper, we use a two-stage sparse factorization approach for blindly estimating the channel parameters and then estimating source components for electroencephalogram (EEG) signals. EEG signals are assumed to be linear mixtures of source components, artifacts, etc. Therefore, a raw EEG data matrix can be factored into the product of two matrices, one of which represents the mixing matrix and the other the source component matrix. Furthermore, the components are sparse in the time-frequency domain, i.e., the factorization is a sparse factorization in the time frequency domain. It is a challenging task to estimate the mixing matrix. Our extensive analysis and computational results, which were based on many sets of EEG data, not only provide firm evidences supporting the above assumption, but also prompt us to propose a new algorithm for estimating the mixing matrix. After the mixing matrix is estimated, the source components are estimated in the time frequency domain using a linear programming method. In an example of the potential applications of our approach, we analyzed the EEG data that was obtained from a modified Sternberg memory experiment. Two almost uncorrelated components obtained by applying the sparse factorization method were selected for phase synchronization analysis. Several interesting findings were obtained, especially that memory-related synchronization and desynchronization appear in the alpha band, and that the strength of alpha band synchronization is related to memory performance. 相似文献