期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs

López-Albelda Bernabé Castro Francisco M. González-Linares José M. Guil Nicolás 《The Journal of supercomputing》2022,78(1):43-71

The Journal of Supercomputing - Nowadays, GPU clusters are available in almost every data processing center. Their GPUs are typically shared by different applications that might have different... 相似文献

2.

Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs

Knap Marcin Czarnul Paweł 《The Journal of supercomputing》2019,75(11):7625-7645

相似文献

3.

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

J. Habich T. ZeiserG. Hager G. Wellein 《Advances in Engineering Software》2011,42(5):266-272

This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment and register shortage. The optimized code is up to an order of magnitude faster than standard two-socket x86 servers with AMD Barcelona or Intel Nehalem CPUs. We further analyze data transfer rates for the PCI-express bus to evaluate the potential benefits of multi-GPU parallelism in a cluster environment. 相似文献

4.

Parallel multi-objective Ant Programming for classification using GPUs

Alberto Cano Juan Luis Olmo Sebastián Ventura 《Journal of Parallel and Distributed Computing》2013

Classification using Ant Programming is a challenging data mining task which demands a great deal of computational resources when handling data sets of high dimensionality. This paper presents a new parallelization approach of an existing multi-objective Ant Programming model for classification, using GPUs and the NVIDIA CUDA programming model. The computational costs of the different steps of the algorithm are evaluated and it is discussed how best to parallelize them. The features of both the CPU parallel and GPU versions of the algorithm are presented. An experimental study is carried out to evaluate the performance and efficiency of the interpreter of the rules, and reports the execution times and speedups regarding variable population size, complexity of the rules mined and dimensionality of the data sets. Experiments measure the original single-threaded and the new multi-threaded CPU and GPU times with different number of GPU devices. The results are reported in terms of the number of Giga GP operations per second of the interpreter (up to 10 billion GPops/s) and the speedup achieved (up to 834× vs CPU, 212× vs 4-threaded CPU). The proposed GPU model is demonstrated to scale efficiently to larger datasets and to multiple GPU devices, which allows the expansion of its applicability to significantly more complicated data sets, previously unmanageable by the original algorithm in reasonable time. 相似文献

5.

Geometric characterization and clustering of graphs using heat kernel embeddings

Bai Xiao Edwin R. Hancock Richard C. Wilson 《Image and vision computing》2010

In this paper, we investigate the use of heat kernels as a means of embedding the individual nodes of a graph in a vector space. The reason for turning to the heat kernel is that it encapsulates information concerning the distribution of path lengths and hence node affinities on the graph. The heat kernel of the graph is found by exponentiating the Laplacian eigensystem over time. In this paper, we explore how graphs can be characterized in a geometric manner using embeddings into a vector space obtained from the heat kernel. We explore two different embedding strategies. The first of these is a direct method in which the matrix of embedding co-ordinates is obtained by performing a Young–Householder decomposition on the heat kernel. The second method is indirect and involves performing a low-distortion embedding by applying multidimensional scaling to the geodesic distances between nodes. We show how the required geodesic distances can be computed using parametrix expansion of the heat kernel. Once the nodes of the graph are embedded using one of the two alternative methods, we can characterize them in a geometric manner using the distribution of the node co-ordinates. We investigate several alternative methods of characterization, including spatial moments for the embedded points, the Laplacian spectrum for the Euclidean distance matrix and scalar curvatures computed from the difference in geodesic and Euclidean distances. We experiment with the resulting algorithms on the COIL database. 相似文献

6.

Evaluation of disconnected quark loops for hadron structure using GPUs

C. Alexandrou M. Constantinou V. Drach K. Hadjiyiannakou K. Jansen G. Koutsou A. Strelchenko A. Vaquero 《Computer Physics Communications》2014

A number of stochastic methods developed for the calculation of fermion loops are investigated and compared, in particular with respect to their efficiency when implemented on Graphics Processing Units (GPUs). We assess the performance of the various methods by studying the convergence and statistical accuracy obtained for observables that require a large number of stochastic noise vectors, such as the isoscalar nucleon axial charge. The various methods are also examined for the evaluation of sigma-terms where noise reduction techniques specific to the twisted mass formulation can be utilized thus reducing the required number of stochastic noise vectors. 相似文献

7.

ARTK-M2: A kernel for Ada tasking requirements: An implementation and an automatic generator

Jorge L. Díaz-Herrera Ronald D. Graft Douglas B. Rupp 《Software》1992,22(4):317-348

A run-time kernel, ARTK-M2, supporting Ada tasking semantics is discussed; full support for task creation, synchronization, communication, scheduling, and termination is provided, together with all options of the Ada rendezvous. An implementation in Modula-2 is presented and a method for automatically translating Ada programs into semantically equivalent Modula-2 programs with corresponding kernel calls is introduced. A parser generator and an attribute grammar were used for the automatic translation. A subset of the Ada Compiler Validation Capability was processed to test the implementation and to illustrate the translation mechanism. The kernel is applicable to the study of real-time control systems; it can also serve as a baseline for studying implementation alternatives of Ada concepts, such as new scheduling algorithms, and for analysing new language constructs. Work is under way to implement some of the changes to the Ada tasking model being proposed as a result of the language revision (Ada9X). Finally, through proper extensions, ARTK-M2 can form an integral part of programming tools such as an Ada compilation system and a distributed kernel for multi-processing environments. 相似文献

8.

Some criteria for the evaluation of graphics implementation languages

Jeffrey L. Posdamer 《Computers & Graphics》1977,2(2):91-95

The choice of an implementation language for a graphics system impacts the ease of programming, range of functions, and overall nature of the system. By analyzing the functions of data and representation, computation and control, a set of criteria for choosing a graphics implementation language are synthesized. 相似文献

9.

Blind motion image deblurring using an effective blur kernel prior

Javaran Taiebeh Askari Hassanpour Hamid Abolghasemi Vahid 《Multimedia Tools and Applications》2019,78(16):22555-22574

Multimedia Tools and Applications - Blind image deblurring, i.e., reconstructing a sharp version of a blurred image, is generally an ill-posed problem, as both the blur kernel and the sharp image... 相似文献

10.

Semiautomatic implementation of protocols using an Estelle-Ccompiler

Vuong S.T. Lau A.C. Chan R.I. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(3):384-393

The basic ideas underlying an Estelle-C compiler, which accepts an Estelle protocol specification and produces a protocol implementation in C, are presented. The implementation of the ISO (International Organization for Standardization) class-2 transparent protocol, using the semiautomatic approach, is discussed. A manual implementation of the protocol is performed and compared to the semiautomatic implementation. The semiautomatic approach to protocol implementation offers several advantages over the conventional manual one, including correctness and modularity in protocol implementation code, conformance to the specification, and reduction in implementation time. Finally, ongoing development of a new Estelle-C compiler is presented 相似文献

11.

A hybrid parallel solver for systems of multivariate polynomials using CPUs and GPUs

Cheon-Hyeon Park Gershon Elber Ku-Jin Kim Gye-Young Kim Joon-Kyung Seong 《Computer aided design》2011,43(11):1360-1369

This paper deals with a problem of finding valid solutions to systems of polynomial constraints. Although there have been several quite successful algorithms based on domain subdivision to resolve this problem, some major issues are still demanding further research. Prime obstacles in developing an efficient subdivision-based polynomial constraint solver are the exhaustive, although hierarchical, search of the zero-set in the parameter domain, which is computationally demanding, and their scalability in terms of the number of variables. In this paper, we present a hybrid parallel algorithm for solving systems of multivariate constraints by exploiting both the CPU and the GPU multicore architectures. We dedicate the CPU for the traversal of the subdivision tree and the GPU for the multivariate polynomial subdivision. By decomposing the constraint solving technique into two different components, hierarchy traversal and polynomial subdivision, each of which is more suitable to CPUs and GPUs, respectively, our solver can fully exploit the availability of hybrid, multicore architectures of CPUs and GPUs. Furthermore, our GPU-based subdivision method takes advantage of the inherent parallelism in the multivariate polynomial subdivision. We demonstrate the efficacy and scalability of the proposed parallel solver through several examples in geometric applications, including Hausdorff distance queries, contact point computations, surface–surface intersections, ray trap constructions, and bisector surface computations. In our experiments, the proposed parallel method achieves up to two orders of magnitude improvement in performance compared to the state-of-the-art subdivision-based CPU solver. 相似文献

12.

Learning bounds for kernel regression using effective data dimensionality

Zhang T 《Neural computation》2005,17(9):2077-2098

Kernel methods can embed finite-dimensional data into infinite-dimensional feature spaces. In spite of the large underlying feature dimensionality, kernel methods can achieve good generalization ability. This observation is often wrongly interpreted, and it has been used to argue that kernel learning can magically avoid the curse-of-dimensionality phenomenon encountered in statistical estimation problems. This letter shows that although using kernel representation, one can embed data into an infinite-dimensional feature space; the effective dimensionality of this embedding, which determines the learning complexity of the underlying kernel machine, is usually small. In particular, we introduce an algebraic definition of a scale-sensitive effective dimension associated with a kernel representation. Based on this quantity, we derive upper bounds on the generalization performance of some kernel regression methods. Moreover, we show that the resulting convergent rates are optimal under various circumstances. 相似文献

13.

基于核矢量过滤的视频检索算法

肖国强罗国兵《电子技术应用》2006,32(4):42-44

视频检索是高维空间中的计算。针对高维计算量大的特点,提出了构造一个核矢量的算法,将高维空间转换到低维空间,在低维空间逐维过滤不相似的数据集,缩小检索范围,提高检索速度。相似文献

14.

Multiclass multiple kernel learning using hypersphere for pattern recognition

Yu Guo Huaitie Xiao 《Applied Intelligence》2018,48(9):2746-2754

相似文献

15.

Stochastic adaptive sampling for mobile sensor networks using kernel regression

Yunfei Xu Jongeun Choi 《International Journal of Control, Automation and Systems》2012,10(4):778-786

In this paper, we provide a stochastic adaptive sampling strategy for mobile sensor networks to estimate scalar fields over surveillance regions using kernel regression, which does not require a priori statistical knowledge of the field. Our approach builds on a Markov Chain Monte Carlo (MCMC) algorithm, viz., the fastest mixing Markov chain under a quantized finite state space, for generating the optimal sampling probability distribution asymptotically. The proposed adaptive sampling algorithm for multiple mobile sensors is numerically evaluated under scalar fields. The comparison simulation study with a random walk benchmark strategy demonstrates the excellent performance of the proposed scheme. 相似文献

16.

KAGO: an approximate adaptive grid-based outlier detection approach using kernel density estimate

Bhattacharjee Panthadeep Garg Ankur Mitra Pinaki 《Pattern Analysis & Applications》2021,24(4):1825-1846

Pattern Analysis and Applications - Outlier detection approaches show their efficacy while extracting unforeseen knowledge in domains such as intrusion detection, e-commerce, and fraudulent... 相似文献

17.

Nonlinear feature selection using Gaussian kernel SVM-RFE for fault diagnosis

Yangtao Xue Li Zhang Bangjun Wang Zhao Zhang Fanzhang Li 《Applied Intelligence》2018,48(10):3306-3331

Feature selection can directly ascertain causes of faults by selecting useful features for fault diagnosis, which can simplify the procedures of fault diagnosis. As an efficient feature selection method, the linear kernel support vector machine recursive feature elimination (SVM-RFE) has been successfully applied to fault diagnosis. However, fault diagnosis is not a linear issue. Thus, this paper introduces the Gaussian kernel SVM-RFE to extract nonlinear features for fault diagnosis. The key issue is the selection of the kernel parameter for the Gaussian kernel SVM-RFE. We introduce three classical and simple kernel parameter selection methods and compare them in experiments. The proposed fault diagnosis framework combines the Gaussian kernel SVM-RFE and the SVM classifier, which can improve the performance of fault diagnosis. Experimental results on the Tennessee Eastman process indicate that the proposed framework for fault diagnosis is an advanced technique. 相似文献

18.

Optimizing fuzzy neural networks for tuning PID controllers using an orthogonal simulated annealing algorithm OSA 总被引：2，自引：0，他引：2

Ho S.-J. Li-Sun Shu Shinn-Ying Ho 《Fuzzy Systems, IEEE Transactions on》2006,14(3):421-434

In this paper, we formulate an optimization problem of establishing a fuzzy neural network model (FNNM) for efficiently tuning proportional-integral-derivative (PID) controllers of various test plants with under-damped responses using a large number P of training plants such that the mean tracking error J of the obtained P control systems is minimized. The FNNM consists of four fuzzy neural networks (FNNs) where each FNN models one of controller parameters (K, T/sub i/, T/sub d/, and b) of PID controllers. An existing indirect, two-stage approach used a dominant pole assignment method with P=198 to find the corresponding PID controllers. Consequently, an adaptive neuro-fuzzy inference system (ANFIS) is used to independently train the four individual FNNs using input the selected 176 of the 198 PID controllers that 22 controllers with parameters having large variation are abandoned. The innovation of the proposed approach is to directly and simultaneously optimize the four FNNs by using a novel orthogonal simulated annealing algorithm (OSA). High performance of the OSA-based approach arises from that OSA can effectively optimize lots of parameters of the FNNM to minimize J. It is shown that the OSA-based FNNM with P=176 can improve the ANFIS-based FNNM in averagely decreasing 13.08% error J and 88.07% tracking error of the 22 test plants by refining the solution of the ANFIS-based method. Furthermore, the OSA-based FNNMs using P=198 and 396 from an extensive tuning domain have similar good performance with that using P=176 in terms of J. 相似文献

19.

Fast implementation of kernel simplex volume analysis based on modified Cholesky factorization for endmember extraction

Jing Li Xiao-run Li Li-jiao Wang Liao-ying Zhao 《浙江大学学报:C卷英文版》2016,17(3):250-257

Endmember extraction is a key step in the hyperspectral image analysis process. The kernel new simplex growing algorithm (KNSGA), recently developed as a nonlinear alternative to the simplex growing algorithm (SGA), has proven a promising endmember extraction technique. However, KNSGA still suffers from two issues limiting its application. First, its random initialization leads to inconsistency in final results; second, excessive computation is caused by the iterations of a simplex volume calculation. To solve the first issue, the spatial pixel purity index (SPPI) method is used in this study to extract the first endmember, eliminating the initialization dependence. A novel approach tackles the second issue by initially using a modified Cholesky factorization to decompose the volume matrix into triangular matrices, in order to avoid directly computing the determinant tautologically in the simplex volume formula. Theoretical analysis and experiments on both simulated and real spectral data demonstrate that the proposed algorithm significantly reduces computational complexity, and runs faster than the original algorithm. 相似文献

20.

Risk and effectiveness criteria for using on-product warnings

《Ergonomics》2012,55(11):2164-2175

A variety of potential hazards can be identified for nearly any consumer product, often more than can be practically or effectively addressed with warning labels. Published standards and guidelines for warnings do not offer a reasonable basis for limiting the number and length of warning labels. This paper proposes criteria for the use and design of warning labels based on effectiveness research, accident data, and product-associated risk. 相似文献