期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

蔡兆克鲍亮徐冬梅《计算机工程与科学》2017,39(8):1425-1430

提出了一种预条件的平方Smith算法求解大型连续Sylvester矩阵方程,该算法利用交替方向隐式迭代(ADI)来构造预条件算子,将原方程转换为非对称Stein方程,并在Krylov子空间中应用平方Smith法迭代产生低秩逼近解。数值实验表明,与已知的Jacobi迭代法等算法相比,该算法有更好的迭代效率和收敛精度。相似文献

2.

一种用于模拟电路仿真的改进型迭代算法

WANG Jia-fang 叶以正《微处理机》2008,29(2)

模拟电路的仿真问题最终归结为对线性代数方程组的求解。利用分块化方法可以降低求解过程中Jacobi矩阵的维数,从而有效降低求解时间。如何降低求解线性方程组的迭代次数,是有效降低求解时间的另一重要问题。首先详细分析了用于求解模拟电路代数方程中Jacobi矩阵的划分问题,然后提出一种改进的隐式迭代方法。最后,通过实验分析了算法中内迭代次数Iin对总迭代次数的影响,该结论对提高整体加速比具有指导意义。相似文献

3.

共轭梯度求解器的FPGA设计与实现

宋庆增顾军华《计算机应用》2011,31(9):2571-2573

针对共轭梯度(CG)迭代算法软件执行效率低、实时性差的缺点,提出一种基于现场可编程逻辑门阵列(FPGA)平台的CG迭代求解器。设计采用软硬件结合的方式构建整个系统,CG协处理器执行CG迭代算法中计算量大、控制简单的代码,以达到硬件加速的目的。控制复杂、计算量较少的代码则依旧在微处理上执行。设计采用行交错数据流,使得整个系统完全无停顿的运行,提高了计算性能。实验结果表明,与软件执行相比,硬件CG协处理器可以获得最高5.7倍的性能加速。相似文献

4.

大规模有限元刚度矩阵存储及其并行求解算法 总被引：1，自引：0，他引：1

纪国良冯仰德《数值计算与计算机应用》2012,33(3):230-240

本文提出一种将有限元单元刚度矩阵直接集成压缩格式的总体刚度矩阵的方法,并针对其线性系统设计了预处理的重启动GMRES(m)并行求解器.集成方法使用了一个“关联结点”的数据结构,它用来记录网格中节点的关联信息,作为集成过程的中间媒介.这种方法能减少大量的存储空间,简单且高效.求解器分别使用Jacobi和稀疏近似逆(SPAI)预条件子.二维和三维弹性力学问题的数值试验表明,在二维情形下,SPAI预条件子具有很好的加速收敛效果和并行效率;在三维情形下,Jacobi预条件子更能减少迭代收敛时间. 相似文献

5.

应用于大规模FPGA的解析式布局算法

高文超周强吕勇强闫海霞钱旭《计算机辅助设计与图形学学报》2011,23(11)

针对FPGA的结构特点,借鉴ASIC布局算法中非线性建模思想,提出一种应用于大规模FPGA的解析式布局算法.该算法以非线性线长为目标,采用较少迭代次数的共轭梯度方法作为求解器,解决组合优化方法时间大量消耗问题.实验结果表明,该方法能够在较短的时间得到较好的布局质量,与FastPlace的结果对比证明了其有效性. 相似文献

6.

GPU上高效Jacobi迭代算法的研究与实现

狄鹏胡长军李建江《小型微型计算机系统》2012,33(9):1962-1967

Jacobi迭代算法是一种求解偏微分方程组的常用循环运算.由于该算法存在语句间的数据相关,阻碍了其在图像处理单元(Graphic Processing Unit,GPU)等并行计算平台的高效实现.通过数学证明与实验验证,比较不同的循环优化策略,消除语句间数据相关,增强数据局部性,从而获得更高的执行性能.此外,利用块(Tile)大小选取模型,合理的划分计算数据,充分利用GPU的运算资源,进一步提高性能.实验结果表明,Jacobi奇偶复制算法比传统Jacobi并行算法在GPU上的性能提高4倍以上. 相似文献

7.

低复杂度的大规模MIMO上行链路软输出信号检测

申东赵丹李强邸敬《计算机应用研究》2021,38(5):1524-1528

针对信道矩阵维度高以及接收信号复杂的情况,提出了一种适用于大规模MIMO系统上行链路信号检测的混合迭代算法,即结合自适应阻尼雅克比(damped Jacobi,DJ)算法和共轭梯度(conjugate gradient,CG)算法。首先利用CG算法为自适应阻尼雅克比迭代算法提供有效的搜索方向;随后提出切比雪夫方法消除松弛参数对信号检测的影响,在降低算法复杂度的同时加快收敛速度;最后,利用信道编译码中的比特似然比近似求解软信息,以提升检测性能。通过理论分析算法的复杂度,仿真在不同判决方式下对不同检测算法进行误码率对比,并对混合迭代算法的收敛进行了分析。仿真结果表明,混合迭代算法在少量迭代次数下快速收敛并近似达到最佳MMSE检测性能,且算法复杂度远低于MMSE算法。相似文献

8.

基于FFT的泊松方程快速求解器的硬件实现

李国燕顾军华宋庆增陆益财周博君《计算机测量与控制》2013,(1):250-253

针对传统的泊松方程求解算法执行效率低、功耗大,很难满足实际需要的缺点,设计了一种FPGA硬件平台的泊松方程快速求解器。设计采用软件与硬件结合的方式,由软件执行控制复杂、计算量较小的任务,而由硬件完成控制简单、计算量大的任务,从而达到硬件加速的目的。在FPGA平台上,独立设计的FFT协处理器可以流水和高度并行化的处理数据,提高了求解器的性能。实验结果表明,硬件实现的基于FFT的泊松方程快速求解器具有较高的计算性能和良好的可扩展性。相似文献

9.

基于PVM的线性方程组的一种网上并行迭代算法 总被引：1，自引：0，他引：1

尚月强杨一都《计算机应用与软件》2006,23(11):50-51

针对基于PVM的桌面PC机联网而成的网络并行计算环境中，处理机的运算速度较快，而处理机间的通信相对较慢的实际情况，提出了求解线性方程组的一种分组Guass-Seidel并行迭代算法，该算法将线性方程组的增广矩阵按行分块储存在各处理机，每台处理机分别对各自的块采用Guass-Seidel迭代法进行迭代计算，其处理机间的通信较少，实现容易。并用1～24台桌面PC机联成的局域网，在PVM 3．4 on Windows2000，VC 6．0并行计算平台上编程对该算法进行了数值试验，试验结果表明，该算法较传统的Jacobi并行迭代算法和传统的Guass—Seidel并行迭代算法更优越。相似文献

10.

基于不完全算法的并行FPGA SAT求解器

黎铁军马柯帆张建民《计算机工程与科学》2021,43(12):2126-2130

可满足性问题是计算机理论与应用的核心问题。在FPGA上提出了一个基于不完全算法的并行求解器pprobSAT+。使用多线程的策略来减少相关组件的等待时间,提高了求解器效率。此外,不同线程采用共用地址和子句信息的数据存储结构,以减少片上存储器的资源开销。当所有数据均存储在FPGA的片上存储器时,pprobSAT+求解器可以达到最佳性能。实验结果表明,相比于单线程的求解器,所提出的pprobSAT+求解器可获得超过2倍的加速比。相似文献

11.

求解半线性椭圆型方程组的并行算法

下载免费PDF全文

徐焱宋君强《计算机工程与科学》1999,21(2):51-55

本文结合区域分裂技术、多重网格方法、加速Ｓｃｈｗａｒｚ收敛方法、高低解方法、非线性Ｊａｃｏｂｉ迭代方法和Ｎｅｗｔｏｎ线性化迭代方法,设计了三种求解半线性椭圆型方程（组）的并行算法：并行Ｎｅｗｔｏｎ多重网格算法、并行非线性多重网格算法和并行加速Ｓｃｈｗａｒｚ收敛算法。数值试验说明这三种算法的并行计算是可行的。相似文献

12.

求解非线性方程组的一种并行算法

下载免费PDF全文

汪保孙秦《计算机工程与应用》2011,47(2):49-51

提出了一种在分布式环境下求解非线性方程组的并行算法,该算法将Newton迭代法中的Jacobi矩阵进行适当的分裂,使得Newton迭代法具有很好的并行性。并在理论上进行了收敛性分析。在HP rx2600集群上进行的数值实验结果表明并行效率达70%以上。相似文献

13.

Fractional Jacobi Galerkin spectral schemes for multi-dimensional time fractional advection–diffusion–reaction equations

Hafez Ramy M. Hammad Magda Doha Eid H. 《Engineering with Computers》2020,38(1):841-858

One of the ongoing issues with time fractional diffusion models is the design of efficient high-order numerical schemes for the solutions of limited regularity. We construct in this paper two efficient Galerkin spectral algorithms for solving multi-dimensional time fractional advection–diffusion–reaction equations with constant and variable coefficients. The model solution is discretized in time with a spectral expansion of fractional-order Jacobi orthogonal functions. For the space discretization, the proposed schemes accommodate high-order Jacobi Galerkin spectral discretization. The numerical schemes do not require imposition of artificial smoothness assumptions in time direction as is required for most methods based on polynomial interpolation. We illustrate the flexibility of the algorithms by comparing the standard Jacobi and the fractional Jacobi spectral methods for three numerical examples. The numerical results indicate that the global character of the fractional Jacobi functions makes them well-suited to time fractional diffusion equations because they naturally take the irregular behavior of the solution into account and thus preserve the singularity of the solution.

相似文献

14.

JACK: an asynchronous communication kernel library for iterative algorithms

Frédéric Magoulès Guillaume Gbikpi-Benissan 《The Journal of supercomputing》2017,73(8):3468-3487

This article presents a new communication library developed to ease the implementation of both asynchronous and synchronous iterative methods. A mathematical and algorithmic framework about fixed-point methods is described to introduce this class of parallel iterative algorithms, although this library can be used for a larger class of parallel algorithms. After an overview of the main features, we describe detailed implementation aspects arising from the asynchronous context. While the library is mainly based on top of Message Passing Interface library, it has been designed to be easily extended to other types of communication middleware. Finally, some numerical experiments validate this new library, used for implementing both a classical parallel scheme and a sub-structuring approach of the Jacobi iterative method. 相似文献

15.

Algorithms for solving numerical linear algebra problems on supercomputers

T. J. Dekker W. Hoffmann P. P. M. De Rijk 《Future Generation Computer Systems》1989,4(4):255-263

In this paper some numerical algorithms are considered in relation to computations on vector and parallel processors. Moreover, some results of experiments on a vector computer are reported.

The algorithms considered are Gaussian elimination and Gauss-Jordan elimination for solving full linear systems, and Hestenes' one-sided Jacobi iteration to calculate the Singular Value Decomposition of a matrix. 相似文献

16.

Built-in self-diagnosis for repairable embedded RAMs

Treuer R. Agarwal V.K. 《Design & Test of Computers, IEEE》1993,10(2):24-33

A method of built-in self-diagnosis (BISD) for repairable, embedded static RAMs (SRAMs) is presented. The BISD circuit, with self-repair, requires about 5% extra area in a 64-kb SRAM. The circuit contains a small reduced-instruction-set processor, which executes diagnosis algorithms stored in a ROM. These algorithms employ hybrid serial/parallel operations when external repair is available or modular operations when self-repair is required. The algorithms, hardware design, and design costs and tradeoffs are discussed 相似文献

17.

Tree-Based Parallel Algorithm Design

G. L. Miller S. -H. Teng 《Algorithmica》1997,19(4):369-389

In this paper a systematic method for the design of efficient parallel algorithms for the dynamic evaluation of computation trees and/or expressions is presented. This method involves the use of uniform closure properties of certain classes of unary functions. Using this method, optimal parallel algorithms are given for many computation tree problems which are important in parallel algebraic and numerical computation, and parallel code generation on exclusive read and exclusive write parallel random access machines. Our algorithmic result is complemented by a P-complete tree problem. Received February 13, 1995; revised March 25, 1996. 相似文献

18.

An improved parallel Jacobi method for diagonalizing a symmetric matrix

Alan H. Karp John Greenstadt 《Parallel Computing》1987,5(3):281-294

We compare five implementations of the Jacobi method for diagonalizing a symmetric matrix. Two of these, the classical Jacobi and sequential sweep Jacobi, have been used on sequential processors. The third method, the parallel sweep Jacobi, has been proposed as the method of choice for parallel processors. The fourth and fifth methods are believed to be new. They are similar to the parallel sweep method but use different schemes for selecting the rotations.

The classical Jacobi method is known to take O(n⁴) time to diagonalize a matrix of order n. We find that the parallel sweep Jacobi run on one processor is about as fast as the sequential sweep Jacobi. Both of these methods take O(n³ log₂n) time. One of our new methods also takes O(n³ log₂n) time, but the other one takes only O(n³) time. The choice among the methods for parallel processors depends on the degree of parallelism possible in the hardware. The time required to diagonalize a matrix on a variety of architectures is modeled.

Unfortunately for proponents of the Jacobi method, we find that the sequential QR method is always faster than the Jacobi method. The QR method is faster even for matrices that are nearly diagonal. If we perform the reduction to tridiagonal form in parallel, the QR method will be faster even on highly parallel systems. 相似文献

19.

A level set method for structural shape and topology optimization using radial basis functions

Zhen Luo Liyong Tong Zhan Kang 《Computers & Structures》2009,87(7-8):425-434

This paper presents an alternative level set method for shape and topology optimization of continuum structures. An implicit free boundary representation model is established by embedding structural boundary into the zero level set of a higher-dimensional level set function. An explicit parameterization scheme for the level set surface is proposed by using radial basis functions with compact support. In doing so, the originally more difficult shape and topology optimization, driven by the temporal and spatial Hamilton–Jacobi partial differential equation (PDE), is transformed into a relatively easier size optimization of the expansion coefficients of the basis functions. The design optimization is converted to an iterative numerical process that combines the parameterization with a derivation of the shape sensitivity of the design functions, so as to allow using mathematical programming algorithms to solve the level set-based design problem and avoid directly solving the Hamilton–Jacobi PDE. Furthermore, a numerically more stable and efficient volume integration scheme is proposed to implement calculations of the shape derivatives, leading to the creation of new holes which are generated initially along the boundary and then propagated to the interior of the design domain. Two widely studied examples are used to demonstrate the effectiveness of the proposed optimization method. 相似文献

20.

Parallel algorithms for the solution of certain large sparse linear systems

《国际计算机数学杂志》2012,89(4):245-260

A couple of approximate inversion techniques are presented which provide a parallel enhancement to several iterative methods for solving linear systems arising from the discretization of boundary value problems. In particular, the Jacobi, Gauss‐Seidel, and successive overrelaxation methods can be improved substantially in a parallel environment by the extensions considered. A special case convergence proof is presented. The use of our approximate inverses with the preconditioned conjugate gradient method is examined and comparisons are made with some recently proposed algorithms in this area that also employ approximate inverses. The methods considered are compared under sequential and parallel hardware assumptions. 相似文献