首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many computational applications rely heavily on numerical linear algebra operations. A good number of these applications are data and computation intensive that need to run in high performance computing environments. The ACTS Collection brings robust and high-end software tools to the hands of application developers. However, this transfer of technology is not always successful due in part to the intricacy of the interfaces associated with the software tools. To alleviate this, here we present PyACTS, a set of Python based interfaces to some of the tools in the ACTS collection. We illustrate some examples of these interfaces and their performance, and evaluate not only their performance but also how user friendly they are compared to the original calls. We also present some examples of scientific applications that use PyACTS.  相似文献   

2.
The user modeling shell system BGP-MS   总被引:1,自引:0,他引:1  
BGP-MS is a user modeling shell system that can assist interactive software systems in adapting to their current users by taking the users' presumed knowledge, beliefs, and goals into account. It offers applications several methods for communicating observations concerning the user to BGP-MS, and for obtaining information on currently held assumptions about the user from BGP-MS. It provides a choice of two integrated formalisms for representing beliefs and goals, and includes several types of inferences for drawing additional assumptions based on an initial interview, observed user actions, and stereotypical knowledge about pre-defined user subgroups. BGP-MS is a customizable software system that is independent from applications, operates concurrently with them, and interacts with them through inter-process communication. For tailoring BGP-MS to a specific application domain, the developer must select those components of BGP-MS that are needed in this domain and fill them with relevant domain-dependent user modeling knowledge. This paper first summarizes the user modeling services that BGP-MS provides to application programs at runtime. It discusses the representational and inferential foundations that determine the scope and the limits of these services, and also gives a detailed example illustrating the interaction between the various system components. It describes interfaces that are available to application developers for tailoring BGP-MS to the specific user modeling needs of their application domains. Finally, it compares the system with all other major user modeling shell systems, and describes a first application that employs BGP-MS for adapting hypertext to users' terminological knowledge.The managing UMUAI editor for this paper was Sandra Carberry, University of Delaware.  相似文献   

3.
We present a linear algebra framework for structured matrices and general optimization problems. The matrices and matrix operations are defined recursively to efficiently capture complex structures and enable advanced compiler optimization. In addition to common dense and sparse matrix types, we define mixed matrices, which allow every element to be of a different type. Using mixed matrices, the low‐ and high‐level structure of complex optimization problems can be encoded in a single type. This type is then analyzed at compile time by a recursive linear solver that picks the optimal algorithm for the given problem. For common computer vision problems, our system yields a speedup of 3–5 compared to other optimization frameworks. The BLAS performance is benchmarked against the MKL library. We achieve a significant speedup in block‐SPMV and block‐SPMM. This work is implemented and released open‐source as a header‐only extension to the C+ + math library Eigen.  相似文献   

4.
This article presents a parallel self-verified solver for dense linear systems of equations. This kind of solver is commonly used in many different kinds of real applications which deal with large matrices. Nevertheless, two key problems appear to limit the use of linear system solvers to a more extensive range of real applications: solution correctness and high computational cost. In order to solve the first one, verified computing would be an interesting choice. An algorithm that uses this concept is able to find a highly accurate and automatically verified result providing more reliability. However, the performance of these algorithms quickly becomes a drawback. Aiming at a better performance, parallel computing techniques were employed. Two main parts of this method were parallelized: the computation of the approximate inverse of matrix A and the preconditioning step. The results obtained show that these optimizations increase significantly the overall performance.  相似文献   

5.
Pen-based user interfaces which leverage the affordances of the pen provide users with more flexibility and natural interaction.However,it is difficult to construct usable pen-based user interfaces because of the lack of support for their development.Toolkit-level support has been exploited to solve this problem,but this approach makes it hard to gain platform independence,easy maintenance and easy extension.In this paper a context-aware infrastructure is created,called WEAVER,to provide pen interaction services for both novel pen-based applcations and legacy GUI-based applications.WEAVER aims to support the pen as another standard interactive device along with the keyboard and mouse and present a high-level access interface to pen input.It emplolys application context to tailor its sevice to different applications.By modeling the application context and egistering the relevant action adapters,WEAVER can offer servicxes,such as gesture recognition,continuous handwriting and other fundamental ink manipulations,to appropriate applications.One of the distinct features of WEAVER is that off-the-shelf GUI-based software packages can be easily enhanced with pen interaction without modifying the existing code.In this paper,the architecture and components of WEAVER are described.In addition ,examples and feedbacks of its use are presented.  相似文献   

6.
The abundant data parallelism available in many-core GPUs has been a key interest to improve accuracy in scientific and engineering simulation. In many cases, most of the simulation time is spent in linear solver involving sparse matrix–vector multiply. In forward petroleum oil and gas reservoir simulation, the application of a stencil relationship to structured grid leads to a family of generalized hepta-diagonal solver matrices with some regularity and structural uniqueness. We present a customized storage scheme that takes advantage of generalized hepta-diagonal sparsity pattern and stencil regularity by optimizing both storage and matrix–vector computation. We also present an in-kernel optimization for implementing sparse matrix–vector multiply (SpMV) and biconjugate gradient stabilized (BiCG-Stab) solver. In-kernel is intended to avoid the multiple kernels invocation associated with the use of the numerical library operators. To keep in-kernel, a lock-free inter-block synchronization is used in which completing thread blocks are assigned some independent computations to avoid repeatedly polling the global memory. Other optimizations enable combining reductions and collective write operations to memory. The in-kernel optimization is particularly useful for the iterative structure of BiCG-Stab for preserving vector data locality and to avoid saving vector data back to memory and reloading on each kernel exit and re-entry. Evaluation uses generalized hepta-diagonal matrices that derives from a range of forward reservoir simulation’s structured grids. Results show the profitability of proposed generalized hepta-diagonal custom storage scheme over standard library storage like compressed sparse row, hybrid sparse, and diagonal formats. Using proposed optimizations, SpMV and BiCG-Stab have been noticeably accelerated compared to other implementations using multiple kernel exit–re-entry when the solver is implemented by invoking numerical library operators.  相似文献   

7.
Virtual machines for remote execution are a useful tool for utilizing light user interfaces and intensive application cores in different physical machines connected through the Internet. In a virtual machine, application cores are distributed in a network. Specific locations, operating systems and hardware characteristics are hidden by virtual machines. They make it possible to use a PC to execute user interfaces and (a few) high‐performance computers for application cores. We present a Java/CORBA‐based brokerage platform that allows remote execution of optimization solvers from a client running on any platform. The system offers a dynamic library of available problem solvers, and a graphic interface to browse several defined properties and metadata on available solvers. In addition, an embedded file compression module to reduce data transfer time is included as a plug‐in feature of the proposed virtual machine. Analogous systems could be constructed for applications in which interaction traffic time is much lower than execution time. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

8.
Interval Newton/Generalized Bisection methods reliably find all numerical solutions within a given domain. Both computational complexity analysis and numerical experiments have shown that solving the corresponding interval linear system generated by interval Newton's methods can be computationally expensive (especially when the nonlinear system is large). In applications, many large-scale nonlinear systems of equations result in sparse interval jacobian matrices. In this paper, we first propose a general indexed storage scheme to store sparse interval matrices We then present an iterative interval linear solver that utilizes the proposed index storage scheme It is expected that the newly proposed general interval iterative sparse linear solver will improve the overall performance for interval Newton/Generalized bisection methods when the jacobian matrices are sparse. In section 1, we briefly review interval Newton's methods. In Section 2, we review some currently used storage schemes for sparse systems. In Section 3, we introduce a new index scheme to store general sparse matrices. In Section 4, we present both sequential and parallel algorithms to evaluate a general sparse Jacobian matrix. In Section 5, we present both sequential and parallel algorithms to solve the corresponding interval linear system by the all-row preconditioned scheme. Conclusions and future work are discussed in Section 6.  相似文献   

9.
A CPU-GPU hybrid approach for the unsymmetric multifrontal method   总被引:1,自引:0,他引:1  
Multifrontal is an efficient direct method for solving large-scale sparse and unsymmetric linear systems. The method transforms a large sparse matrix factorization process into a sequence of factorizations involving smaller dense frontal matrices. Some of these dense operations can be accelerated by using a graphic processing unit (GPU). We analyze the unsymmetric multifrontal method from both an algorithmic and implementational perspective to see how a GPU, in particular the NVIDIA Tesla C2070, can be used to accelerate the computations. Our main accelerating strategies include (i) performing BLAS on both CPU and GPU, (ii) improving the communication efficiency between the CPU and GPU by using page-locked memory, zero-copy memory, and asynchronous memory copy, and (iii) a modified algorithm that reuses the memory between different GPU tasks and sets thresholds to determine whether certain tasks be performed on the GPU. The proposed acceleration strategies are implemented by modifying UMFPACK, which is an unsymmetric multifrontal linear system solver. Numerical results show that the CPU-GPU hybrid approach can accelerate the unsymmetric multifrontal solver, especially for computationally expensive problems.  相似文献   

10.
The Linux operating system is quickly becoming a standard, attracting a wide user community and supporting a broad variety of applications and devices. Other vendors, such as Sun, have provided Linux‐compatible system call interfaces to their kernels, but are constrained by the lack of device support. To address this problem, we present a system (called PITS) to build device drivers, in this case for Solaris x86, from Linux source code. To accomplish this goal, we designed tools and Linux kernel emulation code to handle the myriad incompatibilities. These incompatibilities require the ability to resolve symbol conflicts, emulate internal Linux kernel data structures, handle module initialization, and generate module dependencies. With our method, we show that converting Linux device drivers is possible, but has a few technical difficulties. Issues arise with sparse documentation, external user interfaces, and modular driver implementations. There are also fundamental differences between the two operating systems, such as interrupt and DMA handling. We describe each of these issues and their current solutions to build a functional driver in the Solaris environment. Using the IOzone file system benchmark, we also demonstrate comparable performance between our generated SCSI driver set and their corresponding native counterparts. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

11.
Methods for the solution of sparse eigenvalue problems that are based on spectral projectors and contour integration have recently attracted more and more attention. Such methods require the solution of many shifted sparse linear systems of full size. In most of the literature concerning these eigenvalue solvers, only few words are said on the solution of the linear systems, but they turn out to be very hard to solve by iterative linear solvers in practice. In this work we identify a row projection method for the solution of the inner linear systems encountered in the FEAST algorithm and introduce a novel hybrid parallel and fully iterative implementation of the eigenvalue solver. Our approach ultimately aims at achieving extreme parallelism by exploiting the algorithm’s potential on several levels. We present numerical examples where graphene modeling is one of the target applications. In this application, several hundred or even thousands of eigenvalues from the interior of the spectrum are required, which is a big challenge for state-of-the-art numerical methods.  相似文献   

12.
Numerous engineering application systems have been developed over the past twenty years, and many of these applications will continue to be used for many years to come. Examples of such applications include CAD Systems, finite-element analysis packages and inspection systems. Because many of these applications were developed before graphical workstations became available, they often have simple command-line user interfaces. Thus, there is a need for a graphical user interface management system (UIMS) that can be used to build point-and-click style interfaces for these existing engineering applications. In this paper we describe such a UIMS, and discuss its implementation using an object-oriented database tool. This UIMS allows users to create and modify user interfaces by editing graphical representations of the interfaces, thus eliminating the need to write code to build or modify an interface. The UIMS is implemented using an object-oriented database tool to take advantage of the data manipulation and storage management capabilities it provides. This approach reduces both the quantity and complexity of the code needed to implement the UIMS. It also allowed the UIMS to be implemented in a minimal amount of time.  相似文献   

13.
The focal underdetermined system solver (FOCUSS) is a powerful tool for sparse representation in complex underdetermined systems. This paper presents the fast FOCUSS method based on the bi-conjugate gradient (BICG), termed BICG-FOCUSS, to speed up the convergence rate of the original FOCUSS. BICGFOCUSS was specifically designed to reduce the computational complexity of FOCUSS by solving a complex linear equation using the BICG method according to the rank of the weight matrix in FOCUSS. Experimental results show that BICG-FOCUSS is more efficient in terms of computational time than FOCUSS without losing accuracy. Since FOCUSS is an efficient tool for estimating the space-time clutter spectrum in sparse recoverybased space-time adaptive processing (SR-STAP), we propose BICG-FOCUSS to achieve a fast estimation of the space-time clutter spectrum in mono-static array radar and in the mountaintop system. The high performance of the proposed BICG-FOCUSS in the application is demonstrated with both simulated and real data.  相似文献   

14.
We present a novel GPU‐based approach to robustly and efficiently simulate high‐resolution and complexly layered cloth. The key component of our formulation is a parallelized matrix assembly algorithm that can quickly build a large and sparse matrix in a compressed format and accurately solve linear systems on GPUs. We also present a fast and integrated solution for parallel collision handling, including collision detection and response computations, which utilizes spatio‐temporal coherence. We combine these algorithms as part of a new cloth simulation pipeline that incorporates contact forces into implicit time integration for collision avoidance. The entire pipeline is implemented on GPUs, and we evaluate its performance on complex benchmarks consisting of 100 – 300K triangles. In practice, our system takes a few seconds to simulate one frame of a complex cloth scene, which represents significant speedups over prior CPU and GPU‐based cloth simulation systems.  相似文献   

15.
16.
Matrix computations are both fundamental and ubiquitous in computational science, and as a result, they are frequently used in numerous disciplines of scientific computing and engineering. Due to the high computational complexity of matrix operations, which makes them critical to the performance of a large number of applications, their efficient execution in distributed environments becomes a crucial issue. This work proposes a novel approach for distributing sparse matrix arithmetic operations on computer clusters aiming at speeding-up the processing of high-dimensional matrices. The approach focuses on how to split such operations into independent parallel tasks by considering the intrinsic characteristics that distinguish each type of operation and the particular matrices involved. The approach was applied to the most commonly used arithmetic operations between matrices. The performance of the presented approach was evaluated considering a high-dimensional text feature selection approach and two real-world datasets. Experimental evaluation showed that the proposed approach helped to significantly reduce the computing times of big-scale matrix operations, when compared to serial and multi-thread implementations as well as several linear algebra software libraries.  相似文献   

17.
This paper focuses on the application level improvements in a sparse direct solver specifically used for large-scale unsymmetrical linear equations resulting from unstructured mesh discretization of coupled elliptic/hyperbolic PDEs. Existing sparse direct solvers are designed for distributed server systems taking advantage of both distributed memory and processing units. We conducted extensive numerical experiments with three state-of-the-art direct linear solvers that can work on distributed-memory parallel architectures; namely, MUMPS (MUMPS solver website, http://graal.ens-lyon.fr/MUMPS), WSMP (Technical Report TR RC-21886, IBM, Watson Research Center, Yorktown Heights, 2000), and SUPERLU_DIST (ACM Trans Math Softw 29(2):110–140, 2003). The performance of these solvers was analyzed in detail, using advanced analysis tools such as Tuning and Analysis Utilities (TAU) and Performance Application Programming Interface (PAPI). The performance is evaluated with respect to robustness, speed, scalability, and efficiency in CPU and memory usage. We have determined application level issues that we believe they can improve the performance of a distributed-shared memory hybrid variant of this solver, which is proposed as an alternative solver [SuperLU_MCDT (Many-Core Distributed)] in this paper. The new solver utilizing the MPI/OpenMP hybrid programming is specifically tuned to handle large unsymmetrical systems arising in reservoir simulations so that higher performance and better scalability can be achieved for a large distributed computing system with many nodes of multicore processors. Two main tasks are accomplished during this study: (i) comparisons of public domain solver algorithms; existing state-of-the-art direct sparse linear system solvers are investigated and their performance and weaknesses based on test cases are analyzed, (ii) improvement of direct sparse solver algorithm (SuperLU_MCDT) for many-core distributed systems is achieved. We provided results of numerical tests that were run on up to 16,384 cores, and used many sets of test matrices for reservoir simulations with unstructured meshes. The numerical results showed that SuperLU_MCDT can outperform SuperLU_DIST 3.3 in terms of both speed and robustness.  相似文献   

18.
In many high performance engineering and scientific applications there is a need to use parallel software libraries. Researchers behind these applications find it difficult to understand the interfaces to these libraries because they carry arguments that are related to the parallel environment and performance in addition to arguments related to the problem at hand. In this paper we introduce the use of high level user interfaces for ScaLAPACK. Concretely, a Python-based interface to ScaLAPACK is proposed. Numerical experiments comparing traditional programming practices with our proposed approach are presented. These experiments evaluate not only the performance of the Python interfaces but also how user friendlier they are, compared to the original calls, and show that PyScaLAPACK does not hinder the performance deliverance of ScaLAPACK. Finally, an example of a real scientific application code, whose functionality can be prototyped or extended with the use of PyScaLAPACK, is presented.  相似文献   

19.
Single- and multi-level iterative methods for sparse linear systems are applied to unsteady flow simulations via implementation into a direct numerical simulation solver for incompressible turbulent flows on unstructured meshes. The performance of these solution methods, implemented in the well-established SAMG and ML packages, are quantified in terms of computational speed and memory consumption, with a direct sparse LU solver (SuperLU) used as a reference. The classical test case of unsteady flow over a circular cylinder at low Reynolds numbers is considered, employing a series of increasingly fine anisotropic meshes. As expected, the memory consumption increases dramatically with the considered problem size for the direct solver. Surprisingly, however, the computation times remain reasonable. The speed and memory usage of pointwise algebraic and smoothed aggregation multigrid solvers are found to exhibit near-linear scaling. As an alternative to multi-level solvers, a single-level ILUT-preconditioned GMRES solver with low drop tolerance is also considered. This solver is found to perform sufficiently well only on small meshes. Even then, it is outperformed by pointwise algebraic multigrid on all counts. Finally, the effectiveness of pointwise algebraic multigrid is illustrated by considering a large three-dimensional direct numerical simulation case using a novel parallelization approach on a large distributed memory computing cluster.  相似文献   

20.
Fortran IV subroutines for the in-core solution of linear algebraic systems with a sparse, symmetrically skylined-stored nonsymmetric coefficient matrix are presented. Such systems arise in various computations, among which are the finite element discretization in conjunction with incremental continuum mechanics, or space-time finite elements for dynamical systems. These routines can be used for constrained systems without prearranging. The feature of partial decomposition is installed and its application to the analysis of singular matrices is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号