首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
This paper makes an improvement of computing two nearest-neighbor problems of images on a reconfigurable array of processors (RAP) by increasing the bus width between processors. Based on a base-n system, a constant time algorithm is first presented for computing the maximum/minimum of N log N-bit unsigned integers on a RAP using N processors each with N1/c-bit bus width, where c is a constant and c ≥ 1. Then, two basic operations such as image component labeling and border following are also derived from it. Finally, these algorithms are used to design two constant time algorithms for the nearest neighbor black pixel and the nearest neighbor component problems on an N1/2 × N1/2 image using N1/2 × N1/2 processors each with N1/c-bit bus width, where c is a constant and c ≥ 1. Another contribution of this paper is that the execution time of the proposed algorithms is tunable by the bus width.  相似文献   

2.
Parallel algorithms for solving the satisfaction problem of non-trivial functional and multivalued data dependencies (FDs and MVDs) in a relation of N tuples by M processors are developed in this paper. Algorithms performing, in a parallel manner, batch or interactive checking of these data dependencies are also discussed. The M processors are organized as a linear systolic array. The time complexities of the first two algorithms for solving the FD satisfaction problem under M N are both O(N), and that of Algorithm (3) or (4) for solving the FD or MVD satisfaction problem under N M is O(N2/M). The latter complexity reduced to O(N) if N = M and is at least not worse than O(N log N) if N = M (N/log N).  相似文献   

3.
This paper describes several parallel algorithms for image edge relaxation on array processors with different numbers of processing elements (PEs) connected by a mesh or hypercube network. The time complexity of Prager's original edge relaxation scheme is O(N2) per iteration using floating-point operations on a sequential machine, where N2 is the number of pixels in the image. Modifications to the scheme are made so that no multiplications are employed and only integer operations are required. Moreover, with parallel processing, the time complexity per iteration is reduced to some constant value. A time complexity analysis on two parallel algorithms is performed. Although the algorithm on an array processor with 4N2 PEs achieved higher degree of parallelism, the algorithm with N2 PEs is preferred. Further modifications on the latter algorithm are made to accommodate to fewer PEs.  相似文献   

4.
Nested dissection is a very popular direct method for solving sparse linear systems that arise from finite difference and finite element methods. Worley and Schreiber [16] give a fine grain algorithm for a square array of processors. Their algorithm uses O(N2) processors, each with O(N) memory, to factor an N2 by N2 sparse matrix whose graphs is an N × N mesh. The efficiency of their method is between 1/46 and 1/12. George et al. [6] [8] give a medium grain algorithm for hypercube architecture, while George et al. [7] give an algorithm for shared memory machines. These papers present a column oriented approach which can exploit O(N) parallelism and yield efficiencies up to 50%. Lucas [11] also gives a column oriented scheme which achieves up to 75% efficiency and O(N) parallelism. In this paper, we present a medium to fine grain algorithm for a P × P array of processors with local memory. This algorithm can exploit up to O(N2) parallelism. The efficiency of the fine grain version is comparable to [16] while as a medium grain algorithm achieves about 49% efficiency. The strength of the method is due to three factors: its ability to pipeline much of the computation, overlapping computation and communication, and the use of level 3 BLAS like primitives. In addition to its high efficiency its memory requirement is optimal, only O(N2 log N/P2) words memory is needed per processor.  相似文献   

5.
Stphane 《Pattern recognition》1995,28(12):1993-2000
We propose a parallel thinning algorithm for binary pictures. Given an N × N binary image including an object, our algorithm computes in O(N2) the skeleton of the object, using a pyramidal decomposition of the picture. The behavior of this algorithm is studied considering a family of digitalization of the same object at a different level of resolution. With the Exclusive Read Exclusive Write (EREW) Parallel Random Access Machine (PRAM), our algorithm runs in O(log N) time using O(N2/logN) processors and it is work-optimal. The same result is obtained with high-connectivity distributed memory SIMD machines having strong hypercube and pyramid. We describe the basic operator, the pyramidal algorithm and some experimental results on the SIMD MasPar parallel machine.  相似文献   

6.
In this paper, a distributed selectsort algorithm and a parameterized selectsort algorithm are presented to be applied on distributed systems for cases when N P where N is the number of elements to be sorted and P is the number of processors in the system. The distributed system considered in this paper uses a broadcasting channel for communication between processors. We show that the number of messages required for the parameterized selectsort algorithm is independent of N and is of complexity O(P), which is optimal in a distributed system with P processors. Furthermore, the amount of communication required in terms of elements is N + O(P3) and the computation time complexity is O((N/P)lgN + P2lg(N/P)). Hence, when N P3, the computation time complexity is O((N/P)lgN), which is optimal using P processors. In addition, this parameterized algorithm provides us with a parameter K such that by choosing the value of K allows us to trade among processing requirement, memory requirement, and communication requirement. It is shown that this parameterized algorithm can reduce the communication requirements significantly while only slightly increasing the computation requirements.  相似文献   

7.
Parallel clustering algorithms   总被引:3,自引:0,他引:3  
Clustering techniques play an important role in exploratory pattern analysis, unsupervised learning and image segmentation applications. Many clustering algorithms, both partitional clustering and hierarchical clustering, require intensive computation, even for a modest number of patterns. This paper presents two parallel clustering algorithms. For a clustering problem with N = 2n patterns and M = 2m features, the time complexity of the traditional partitional clustering algorithm on a single processor computer is O(MNK), where K is the number of clusters. The proposed algorithm on anSIMD computer with MN processors has a time complexity O(K(n + m)). The time complexity of the proposed single-link hierarchical clustering algorithm is reduced from O(MN2) of the uniprocessor algorithm to O(nN) with MN processors.  相似文献   

8.
W.A.  H.J. 《Pattern recognition》1995,28(12):1985-1992
A fast digital Radon transform based on recursively defined digital straight lines is described, which has the sequential complexity of N2 log N additions for an N × N image. This transform can be used to evaluate the Hough transform to detect straight lines in a digital image. Whilst a parallel implementation of the Hough transform algorithm is difficult because of global memory access requirements, the fast digital Radon transform is vectorizable and therefore well suited for parallel computation. The structure of the fast algorithm is shown to be quite similar to the FFT algorithm for decimation in frequency. It is demonstrated that even for sequential computation the fast Radon transform is an attractive alternative to the classical Hough transform algorithm.  相似文献   

9.
An on-line parser processes each word as soon as it is typed by the user, without waiting for the end of the sentence. Thus, in an interactive system, a sentence will be parsed almost immediately after the last word has been presented.

The complexity of an on-line parser is determined by the resources needed for the analysis of a single word, as it is assumed that previous words have been processed already. Sequential parsing algorithms like CYK or Earley need O(n2) time for the nth word. A parallel implementation in O(n) time on O(n) processors is straightforward. In this paper a novel parallel on-line parser is presented that needs O(1) time on O(n2) processors.  相似文献   


10.
We substantially improve the known algorithms for approximating all the complex zeros of an nth degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2b, by using order of n arithmetic operations and order of (b + n)n2 Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n2 Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)).  相似文献   

11.
The problem of finding a rectilinear minimum bend path (RMBP) between two designated points inside a rectilinear polygon has applications in robotics and motion planning. In this paper, we present efficient algorithms to solve the query version of the RMBP problem for special classes of rectilinear polygons given their visibility graphs. Specifically, we show that given an unweighted graph G = (V, E), with ¦V¦ = N and ¦E¦ = M, algorithms to preprocess G in linear space and time such that the shortest distance queries — queries asking for the distance between any pair of nodes in the graph — can be answered in constant time and space are presented in this paper. For the case of a chordal graph G, our algorithms give a distance which is at most one away from the actual shortest distance. When G is a K-chordal graph, our algorithm produces an exact shortest distance in O(K) time. We also present a non-trivial parallel implementation of the sequential preprocessing algorithm for the CREW-PRAM model which runs in O(log2 N) time using O(N + M) processors. After the preprocessing, we can answer the queries in constant time using a single processor.  相似文献   

12.
We compare five implementations of the Jacobi method for diagonalizing a symmetric matrix. Two of these, the classical Jacobi and sequential sweep Jacobi, have been used on sequential processors. The third method, the parallel sweep Jacobi, has been proposed as the method of choice for parallel processors. The fourth and fifth methods are believed to be new. They are similar to the parallel sweep method but use different schemes for selecting the rotations.

The classical Jacobi method is known to take O(n4) time to diagonalize a matrix of order n. We find that the parallel sweep Jacobi run on one processor is about as fast as the sequential sweep Jacobi. Both of these methods take O(n3 log2n) time. One of our new methods also takes O(n3 log2n) time, but the other one takes only O(n3) time. The choice among the methods for parallel processors depends on the degree of parallelism possible in the hardware. The time required to diagonalize a matrix on a variety of architectures is modeled.

Unfortunately for proponents of the Jacobi method, we find that the sequential QR method is always faster than the Jacobi method. The QR method is faster even for matrices that are nearly diagonal. If we perform the reduction to tridiagonal form in parallel, the QR method will be faster even on highly parallel systems.  相似文献   


13.
Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n2/p) time for pn/log n.  相似文献   

14.
In this paper, we consider the problem of recognizing whether a given network is a rectangular mesh. We present an efficient distributed algorithm with an O(N) message and time complexity, where N is the number of nodes in the network. This is an improvement of a previous algorithm presented in Mohan (1990) with a message complexity of O(N log N) and time complexity of O(N1.6). The proposed algorithm is constructive in nature and also assigns coordinates to the nodes of the network.  相似文献   

15.
A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence (LCS) of two strings is a common subsequence of both that is as long as any other common subsequences. The problem is to find the LCS of two given strings. The bound on the complexity of this problem under the decision tree model is known to be mn if the number of distinct symbols that can appear in strings is infinite, where m and n are the lengths of the two strings, respectively, and m⩽n. In this paper, we propose two parallel algorithms far this problem on the CREW-PRAM model. One takes O(log2 m + log n) time with mn/log m processors, which is faster than all the existing algorithms on the same model. The other takes O(log2 m log log m) time with mn/(log2 m log log m) processors when log2 m log log m > log n, or otherwise O(log n) time with mn/log n processors, which is optimal in the sense that the time×processors bound matches the complexity bound of the problem. Both algorithms exploit nice properties of the LCS problem that are discovered in this paper  相似文献   

16.
This paper describes some new techniques for the rapid evaluation and fitting of radial basic functions. The techniques are based on the hierarchical and multipole expansions recently introduced by several authors for the calculation of many-body potentials. Consider in particular the N term thin-plate spline, s(x) = Σj=1N djφ(xxj), where φ(u) = |u|2log|u|, in 2-dimensions. The direct evaluation of s at a single extra point requires an extra O(N) operations. This paper shows that, with judicious use of series expansions, the incremental cost of evaluating s(x) to within precision ε, can be cut to O(1+|log ε|) operations. In particular, if A is the interpolation matrix, ai,j = φ(xixj, the technique allows computation of the matrix-vector product Ad in O(N), rather than the previously required O(N2) operations, and using only O(N) storage. Fast, storage-efficient, computation of this matrix-vector product makes pre-conditioned conjugate-gradient methods very attractive as solvers of the interpolation equations, Ad = y, when N is large.  相似文献   

17.
The parallel stratagem in this paper uses scattered square decomposition, introduced by G. Fox, for its data assignment and then exploits parallelism in the solution steps of the sequential Householder tridiagonalization algorithm. One may condense a real symmetric full matrix A of order n into a tridiagonal form by the stratagem in concurrent machines where N(= D2) processors are used. Expressions for efficiency and speedup are given for the evaluation of the stratagem. An alternative stratagem which requires less data transmission but more computations is also discussed. The results shown that the Householder Method of tridiagonalization may be implemented on a concurrent machine efficiently by scattered square decomposition provided that the number of matrix elements contained in each processor is much larger than the number of processors of the concurrent machine, and the ratio of the time to transmit one data item from one processor to any other processor to the time to perform a floating-point arithmetic operation is small enough.  相似文献   

18.
An efficient parallel algorithm for merging two sorted lists is presented. The algorithm is based on a novel partitioning algorithm that splits the two lists among the processors, in a way that ensures load balance during the merge. The partitioning algorithm can itself be efficiently parallelized, allowing the solution to scale with increased numbers of processors. A shared memory multiprocessor is assumed. The time complexity for partitioning and merging is O(N/p + log N), where p is the number of processors and N is the total number of elements in the two lists. Implementation results on a twenty node Sequent Symmetry multiprocessor are also presented.  相似文献   

19.
A parallel two-list algorithm for the knapsack problem   总被引:10,自引:0,他引:10  
An n-element knapsack problem has 2n possible solutions to search over, so a task which can be accomplished in 2″ trials if an exhaustive search is used. Due to the exponential time in solving the knapsack problem, the problem is considered to be very hard. In the past decade, much effort has been done in order to find techniques which could lead to practical algorithms with reasonable running time. In 1994, Chang et al. proposed a brilliant parallel algorithm, which needs O(2n/8) processors to solve the knapsack problem in O(2n/2) time; that is, the cost of Chang et al.'s parallel algorithm is O(25n/8). In this paper, we propose a parallel algorithm to improve Chang et al.'s parallel algorithm by reducing the time complexity to be O(23n/8) under the same O(2n/8) processors available. Thus, the proposed parallel algorithm has a cost of O(2n/2). It is an improvement over previous literature. We believe that the proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular.  相似文献   

20.
Given N scattered data points, we examined the problem of finding N variable Multiquadric (MQ) shape-parameters, or R2 values. Because the problem of finding the optimal R2 values is a nonlinear one, we optimized these parameters numerically by minimizing the root-mean-square (RMS) errors. The resulting R2 values varied over many orders of magnitude. We have tested this approach on a number of univariate and bivariate (Franke's) problems, and found that the RMS error reduction was substantial.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号