首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider the class of matrices where the number of nonzero entries is kN, i.e., where the average number of nonzero entries per column is k.  相似文献   

2.
In this paper, a new methodology for speeding up Matrix–Matrix Multiplication using Single Instruction Multiple Data unit, at one and more cores having a shared cache, is presented. This methodology achieves higher execution speed than ATLAS state of the art library (speedup from 1.08 up to 3.5), by decreasing the number of instructions (load/store and arithmetic) and the data cache accesses and misses in the memory hierarchy. This is achieved by fully exploiting the software characteristics (e.g. data reuse) and hardware parameters (e.g. data caches sizes and associativities) as one problem and not separately, giving high quality solutions and a smaller search space.  相似文献   

3.
Branch-and-Bound (B&B) algorithms are tree-based exploratory methods for solving combinatorial optimization problems exactly to optimality. These problems are often large in size and known to be NP-hard to solve. The construction and exploration of the B&B-tree are performed using four operators: branching, bounding, selection and pruning. Such algorithms are irregular which makes their parallel design and implementation on GPU challenging. Existing GPU-accelerated B&B algorithms perform only a part of the algorithm on the GPU and rely on the transfer of pools of subproblems across the PCI Express bus to the device. To the best of our knowledge, the algorithm presented in this paper is the first GPU-based B&B algorithm that performs all four operators on the device and subsequently avoids the data transfer bottleneck between CPU and GPU. The implementation on GPU is based on the Integer–Vector–Matrix (IVM) data structure which is used instead of a conventional linked-list to store and manage the pool of subproblems. This paper revisits the IVM-based B&B algorithm on the GPU, addressing the irregularity of the algorithm in terms of workload, memory access patterns and control flow. In particular, the focus is put on reducing thread divergence by making a judicious choice for the mapping of threads onto the data. Compared to a GPU-accelerated B&B based on a linked-list, the algorithm presented in this paper solves a set of standard flowshop instances on an average 3.3 times faster.  相似文献   

4.
Wang–Landau sampling is implemented on the Graphics Processing Unit (GPU) with the Compute Unified Device Architecture (CUDA). Performances on three different GPU cards, including the new generation Fermi architecture card, are compared with that on a Central Processing Unit (CPU). The parameters for massively parallel Wang–Landau sampling are tuned in order to achieve fast convergence. For simulations of the water cluster systems, we obtain an average of over 50 times speedup for a given workload.  相似文献   

5.
The microprocessor industry has responded to memory, power and ILP walls by turning to many-core processors, increasing parallelism as the primary method to improve processor performance. These processors are expected to consist of tens or even hundreds of cores. One of these future processors is the 48-core experimental processor Single-Chip Cloud Computer (SCC). The SCC was created by Intel Labs as a platform for many-core software research.  相似文献   

6.
Face detection is a key component in applications such as security surveillance and human–computer interaction systems, and real-time recognition is essential in many scenarios. The Viola–Jones algorithm is an attractive means of meeting the real time requirement, and has been widely implemented on custom hardware, FPGAs and GPUs. We demonstrate a GPU implementation that achieves competitive performance, but with low development costs. Our solution treats the irregularity inherent to the algorithm using a novel dynamic warp scheduling approach that eliminates thread divergence. This new scheme also employs a thread pool mechanism, which significantly alleviates the cost of creating, switching, and terminating threads. Compared to static thread scheduling, our dynamic warp scheduling approach reduces the execution time by a factor of 3. To maximize detection throughput, we also run on multiple GPUs, realizing 95.6 FPS on 5 Fermi GPUs.  相似文献   

7.
Target detection in clutter is a fundamental problem in radar signal processing. When the received radar signal contains only few pulses, it is difficult to achieve a satisfactory performance using the traditional detection algorithm. In recent times, a generalized constant false alarm rate (CFAR) detector on the Riemannian manifold of Hermitian positive-definite (HPD) matrix was proposed. The employment of this detector, which compares the Riemannian distance between the covariance matrix of the cell under test (CUT) and an average matrix of reference cells with a given threshold, has significantly improved the detection performance. However, the application of this detector in real scenarios is still limited by two problems; it is computationally expensive and the detection performance is not very good since the Riemannian distance is utilized. In this paper, the symmetrized Kullback–Leibler (sKL) and the total Kullback–Leibler (tKL) divergences, instead of the Riemannian distance, are used as dissimilarity measures in the matrix CFAR detector. According to sKL and tKL divergences, three average matrices, the sKL mean, the sKL median, and the tKL t center, are derived. Furthermore, the relationship between the detection performance and the anisotropy of the distance measure used in the matrix CFAR detector is explored. Numerical experiments and real radar sea clutter data are given to confirm the superiority of the proposed algorithms in terms of the computational complexity and the detection performance.  相似文献   

8.
9.
The existing margin-based discriminant analysis methods such as nonparametric discriminant analysis use K-nearest neighbor (K-NN) technique to characterize the margin. The manifold learning–based methods use K-NN technique to characterize the local structure. These methods encounter a common problem, that is, the nearest neighbor parameter K should be chosen in advance. How to choose an optimal K is a theoretically difficult problem. In this paper, we present a new margin characterization method named sparse margin–based discriminant analysis (SMDA) using the sparse representation. SMDA can successfully avoid the difficulty of parameter selection. Sparse representation can be considered as a generalization of K-NN technique. For a test sample, it can adaptively select the training samples that give the most compact representation. We characterize the margin by sparse representation. The proposed method is evaluated by using AR, Extended Yale B database, and the CENPARMI handwritten numeral database. Experimental results show the effectiveness of the proposed method; its performance is better than some other state-of-the-art feature extraction methods.  相似文献   

10.
This paper presents a visualization method called the deformed cube for visualizing 3D velocity vector field.Based on the decomposition of the tensor which describes the changes of the velocity,it provides a technique for visualizing local flow.A deformed cube,a cube transformed by a tensor in a local coordinate frame,shows the local stretch,shear and rigid body rotation of the local flow corresponding to the decomposed component of the tensor.Users can interactively view the local deformation or any component of the changes.The animation of the deformed cube moving along a streamline achieves a more global impression of the flow field.This method is intended as a complement to global visualization methods.  相似文献   

11.
We consider the problem of scheduling a set of jobs on a set of identical parallel machines where the objective is to minimize the total weighted earliness and tardiness penalties with respect to a common due date. We propose a hybrid heuristic algorithm for constructing good solutions, combining priority rules for assigning jobs to machines and a local search with exact procedures for solving the one-machine subproblems. These solutions are then used in two metaheuristic frameworks, Path Relinking and Scatter Search, to obtain high quality solutions for the problem.The algorithms are tested on a large number of test instances to assess the efficiency of the proposed strategies.The results show that our algorithms consistently outperform the best reported results for this problem.  相似文献   

12.
Many target tracking problems can actually be cast as joint tracking problems where the underlying target state may only be observed via the relationship with a latent variable. In the presence of uncertainties in both observations and latent variable, which encapsulates the target tracking into a variational problem, the expectation–maximization (EM) method provides an iterative procedure under Bayesian inference framework to estimate the state of target in the process which minimizes the latent variable uncertainty. In this paper, we treat the joint tracking problem using a united framework under the EM method and provide a comprehensive overview of various EM approaches in joint tracking context from their necessity, benefits, and challenging viewpoints. Some examples on the EM application idea are presented. In addition, future research directions and open issues for using EM method in the joint tracking are given.  相似文献   

13.
Automatic onset detection and picking algorithm has been proposed by applying the spectro-ratio on time–frequency sub-band. The proposed algorithm does not need any parameter settings as it will work on data generated by either short or very broad band seismometers. Our algorithm is applied on local events from Cairo region recorded by three stations of the Egyptian National Seismic Network (ENSN). Maximum standard deviation is observed to be 0.113 s of the corresponding manual picks made by analysts.  相似文献   

14.
15.
《Computer Languages》1988,13(3-4):143-147
At the Center for Multidisciplinary Studies, University of Belgrade, Yugoslavia, a high level language for easily manipulating fuzzy set operations was developed. In this paper, after a brief introduction to the theory of fuzzy sets, some features and possibilities are described from the user point of view, and an application in picture enhancement is shown. The language presented was written in BASIC, thus any microcomputer, for personal/home use, should be able to execute RASP.  相似文献   

16.
17.
18.
In this paper, we develop a diagnosis model based on particle swarm optimization (PSO), support vector machines (SVMs) and association rules (ARs) to diagnose erythemato-squamous diseases. The proposed model consists of two stages: first, AR is used to select the optimal feature subset from the original feature set; then a PSO based approach for parameter determination of SVM is developed to find the best parameters of kernel function (based on the fact that kernel parameter setting in the SVM training procedure significantly influences the classification accuracy, and PSO is a promising tool for global searching). Experimental results show that the proposed AR_PSO–SVM model achieves 98.91% classification accuracy using 24 features of the erythemato-squamous diseases dataset taken from UCI (University of California at Irvine) machine learning database. Therefore, we can conclude that our proposed method is very promising compared to the previously reported results.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号