首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Thedistance transform(DT) is an image computation tool which can be used to extract the information about the shape and the position of the foreground pixels relative to each other. It converts a binary image into a grey-level image, where each pixel has a value corresponding to the distance to the nearest foreground pixel. The time complexity for computing the distance transform is fully dependent on the different distance metrics. Especially, the more exact the distance transform is, the worse execution time reached will be. Nowadays, quite often thousands of images are processed in a limited time. It seems quite impossible for a sequential computer to do such a computation for the distance transform in real time. In order to provide efficient distance transform computation, it is considerably desirable to develop a parallel algorithm for this operation. In this paper, based on the diagonal propagation approach, we first provide anO(N2) time sequential algorithm to compute thechessboard distance transform(CDT) of anN×Nimage, which is a DT using the chessboard distance metrics. Based on the proposed sequential algorithm, the CDT of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N2/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N2/log logN) processors, andO(logN) time on the hypercube computer usingO(N2/logN) processors. Following the mapping as proposed by Lee and Horng, the algorithm for the medial axis transform is also efficiently derived. The medial axis transform of a 2D binary image array of sizeN×Ncan be computed inO(logN) time on the EREW PRAM model usingO(N2/logN) processors,O(log logN) time on the CRCW PRAM model usingO(N2/log logN) processors, andO(logN) time on the hypercube computer usingO(N2/logN) processors. The proposed parallel algorithms are composed of a set of prefix operations. In each prefix operation phase, only increase (add-one) operation and minimum operation are employed. So, the algorithms are especially efficient in practical applications.  相似文献   

2.
Given ann-vertex simple polygon we address the following problems: (i) find the shortest path between two pointss andd insideP, and (ii) compute the shortestpath tree between a single points and each vertex ofP (which implicitly represents all the shortest paths). We show how to solve the first problem inO(logn) time usingO(n) processors, and the more general second problem inO(log2 n) time usingO(n) processors, and the more general second problem inO(log2 n) time usingO(n) processors for any simple polygonP. We assume the CREW RAM shared memory model of computation in which concurrent reads are allowed, but no two processors should attempt to simultaneously write in the same memory location. The algorithms are based on the divide-and-conquer paradigm and are quite different from the known sequential algorithmsResearch supported by the Faculty of Graduate Studies and Research (McGill University) grant 276-07  相似文献   

3.
Y.-J. Chen  S.-J. Horng 《Computing》1997,59(2):95-114
To represent a region of a digital image as the union of maximal upright squares contained in the region is called the medial axis transform. In this paper, we present anO(logn) time parallel algorithm for the medial axis transform of ann×n binary image on an SIMD mesh-connected computers with hyperbus broadcasting usingn 3 processors.  相似文献   

4.
We present three explicit schemes for distributingM variables amongN memory modules, whereM=Θ(N 1.5),M = Θ(N 2), andM=Θ(N 3), respectively. Each variable is replicated into a constant number of copies stored in distinct modules. We show thatN processors, directly accessing the memories through a complete interconnection, can read/write any set ofN variables in worst-case timeO (N 1/3),O(N 1/2), andO(N 2/3), respectively for the three schemes. The access times for the last two schemes are optimal with respect to the particular redundancy values used by such schemes. The address computation can be carried out efficiently by each processor without recourse to a complete memory map and requiring onlyO(1) internal storage. This paper was partially supported by NFS Grants CCR-91-96152 and CCR-94-00232, by ONR Contract N00014-91-J-4052, ARPA Order 8225, and by the ESPRIT III Basic Research Programme of the EC under Contract No. 9072 (Project GEPPCOM). Results reported here were presented in preliminary form at the 10th Symposium on Theoretical Aspects of Computer Science (Würzburg, Germany, 1993), and at the 5th ACM Symposium on Parallel Algorithms and Architectures (Velen, Germany, 1993).  相似文献   

5.
The solution of uniform bicubic B-spline curve/surface fitting problem is considered. Based on the matrix perturbation method, this paper first presents a novel approximateO(n/p)-time parallel B-spline curve fitting algorithm for finding the correspondingncontrol points that interpolate thosendata points on a linear array processor withpprocessors, wherepn. Givenm×ndata points, we then present anO(mn/(p1p2))-time parallel algorithm for solving the uniform bicubic B-spline surface fitting problem on ap1×p2mesh-connected computer, wherep1mandp2n. The relative error analyses of our two stable and cost-optimal parallel solvers are also given. When settingp1=mandp2=n, a constant-time parallel solver for B-spline surface fitting can be derived; this time- and cost-optimal result is a direct method, in contrast to the parallel iterative method of Chenget al.(Parallel B-spline surface interpolation on a mesh-connected processor array,J. Parallel Distrib. Comput.24, 2 (1995), 224–229).  相似文献   

6.
An active area of research in supercomputing is concerned with mapping certain finite sums, such as discrete Fourier transforms, onto arrays of processors. This paper presents systolic mapping techniques that exploit the parallelism inherent in discrete Fourier transforms. It is established that, for anM-dimensional signal, parallel executions of such transforms are closely related to mappings of an (M + 1)-dimensional finite vector space into itself. Three examples of such parallel schemes are then described for the discrete Fourier transform of a two-dimensional finite extent sequence of sizeN 1 ×N 2. The first is a linear array ofN 1 +N 2 – 1 processors and takesO(N 1 N 2) steps. The second is anN 1 ×N 2 rectangular array of processors and takesO(N 1 +N 2) steps, and the third is a hexagonal array which usesN 1 N 2 + (N 2 – 1)(N 1 +N 2 – 1) processors andO(N 1 +N 2) steps. All three mappings are optimal in that they achieve asymptotically the highest speedup possible over the sequential execution of the same transform, and can easily be generalized to higher dimensions.  相似文献   

7.
We present three explicit schemes for distributingM variables amongN memory modules, whereM=Θ(N 1.5),M = Θ(N 2), andM=Θ(N 3), respectively. Each variable is replicated into a constant number of copies stored in distinct modules. We show thatN processors, directly accessing the memories through a complete interconnection, can read/write any set ofN variables in worst-case timeO (N 1/3),O(N 1/2), andO(N 2/3), respectively for the three schemes. The access times for the last two schemes are optimal with respect to the particular redundancy values used by such schemes. The address computation can be carried out efficiently by each processor without recourse to a complete memory map and requiring onlyO(1) internal storage.  相似文献   

8.
Li  Jie  Pan  Yi  Shen  Hong 《The Journal of supercomputing》2003,24(3):251-258
Topological sort of an acyclic graph has many applications such as job scheduling and network analysis. Due to its importance, it has been tackled on many models. Dekel et al. [3], proposed an algorithm for solving the problem in O(log2 N) time on the hypercube or shuffle-exchange networks with O(N 3) processors. Chaudhuri [2], gave an O(log N) algorithm using O(N 3) processors on a CRCW PRAM model. On the LARPBS (Linear Arrays with a Reconfigurable Pipelined Bus System) model, Li et al. [5] showed that the problem for a weighted directed graph with N vertices can be solved in O(log N) time by using N 3 processors. In this paper, a more efficient topological sort algorithm is proposed on the same LARPBS model. We show that the problem can be solved in O(log N) time by using N 3/log N processors. We show that the algorithm has better time and processor complexities than the best algorithm on the hypercube, and has the same time complexity but better processor complexity than the best algorithm on the CRCW PRAM model.  相似文献   

9.
Previous research on developing parallel triangulation algorithms concentrated on triangulating planar point sets.O(log3 n) running time algorithms usingO(n) processors have been developed in Refs. 1 and 2. Atallah and Goodrich(3) presented a data structure that can be viewed as a parallel analogue of the sequential plane-sweeping paradigm, which can be used to triangulate a planar point set inO(logn loglogn) time usingO(n) processors. Recently Merks(4) described an algorithm for triangulating point sets which runs inO(logn) time usingO(n) processors, and is thus optimal. In this paper we develop a parallel algorithm for triangulating simplicial point sets in arbitrary dimensions based on the idea of the sequential algorithm presented in Ref. 5. The algorithm runs inO(log2 n) time usingO(n/logn) processors. The algorithm hasO(n logn) as the product of the running time and the number of processors; i.e., an optimal speed-up.  相似文献   

10.
We present several fast algorithms for multiple addition and prefix sum on the Linear Array with a Reconfigurable Pipelined Bus System (LARPBS), a recently proposed architecture based on optical buses. Our algorithm for adding N integers runs on an N log M-processor LARPBS in O(log* N) time, where log* N is the number of times logarithm has to be taken to reduce N below 1 and M is the largest integer in the input. Our addition algorithm improves the time complexity of several matrix multiplication algorithms proposed by Li, Pan and Zheng (IEEE Trans. Parallel and Distributed Systems, 9(8):705–720, 1998). We also present several fast algorithms for computing prefix sums of N integers on the LARPBS. For integers with bounded magnitude, our first algorithm for prefix sum computation runs in O(log log N) time using N processors and in O(1) time using N 1+ processors, for < 1. For integers with unbounded magnitude, the first algorithm for multiple addition runs in O(log log N log* N) time using N log M processors, when M is the largest integer in the input. Our second algorithm for multiple addition runs in O(log* N) time using N 1+ log M processors, for < 1. We also show suitable extensions of our algorithm for real numbers.  相似文献   

11.
This paper considers the histogramming problem on hypercube.N-PE hypercube is used to process anN 12 × N12digitized image in which each pixel has a gray-level value between 0 andM − 1. In general,M, the range of gray-level values is much smaller thanN, the number of pixels being processed. Our algorithm generates the histogram of the image inO(logM * logN) time using radix sort and efficient data movement operations. This technique can be implemented on butterfly, shuffle-exchange and fat pyramid organizations.  相似文献   

12.
《Real》1996,2(6):373-382
This paper presents the architecture and the implementation of template matching on a 3-D piece-wise regular processor space that forms a two-dimensional array of linear systolic arrays. Template matching can be considered as a 2-D convolution of an image of sizeN × Nwith a kernel of sizer× r. Conventional high-speed implementations use 2-D systolic arrays of sizeO(r2) which compute inO(N2) time. The drawback of this solution is that the size of the processor array follows on the size of the convolution kernel. This does not permit the allocation of more processors in order to meet the real-time requirements. With the approach used in this paper, the size of the processor array may be extended up toO(sr2), 1 ≤sN, thereby accomplishing the calculations inO(N2/s) time. In the case whens=r, ther × rmesh of 1-D systolic arrays of sizeO(r) is yielded. The piecewise regularity of the 3-D processor array allows also easy physical realization.  相似文献   

13.
The main results of this paper are efficient parallel algorithms, MSP and LOCATE, for computing minimal spanning trees and locating minimal paths in directed graphs, respectively. Algorithm MSP has time complexityO(log3 n) usingO(n 3/logn) processors, while LOCATE has time complexityO(logn) usingO(n 2) processors. Algorithm MSP is derived from sequential algorithms, when the unbounded parallelism model is used.  相似文献   

14.
The recursive shortest spanning tree (RSST) algorithm has been used for various image and video coding systems. The speed is very demanding for such applications. However, the RSST algorithm is too complex to perform in a reasonable time. This motivates the present work. A distributed algorithm that constructs a recursive shortest spanning tree for image segmentation with a fixed number of processors is presented in this paper. Using a tailored data partition strategy to assign jobs to processors in our proposed parallel recursive shortest spanning tree (PRSST) algorithm, we derive a new lower bound ofO(n) for one processor with ann-pixel image. The complexity of the proposed algorithm is cost-optimal and the total number of messages required isO(c) for images of any size, wherecis a small constant. The objective quality of segmented images produced by our PRSST has maximum 1.5 dB difference from those generated by Morris's RSST algorithm for images of the size of 128 by 128 and 150 by 150 pixels.  相似文献   

15.
We give the first efficient parallel algorithms for solving the arrangement problem. We give a deterministic algorithm for the CREW PRAM which runs in nearly optimal bounds ofO (logn log* n) time andn 2/logn processors. We generalize this to obtain anO (logn log* n)-time algorithm usingn d /logn processors for solving the problem ind dimensions. We also give a randomized algorithm for the EREW PRAM that constructs an arrangement ofn lines on-line, in which each insertion is done in optimalO (logn) time usingn/logn processors. Our algorithms develop new parallel data structures and new methods for traversing an arrangement.This work was supported by the National Science Foundation, under Grants CCR-8657562 and CCR-8858799, NSF/DARPA under Grant CCR-8907960, and Digital Equipment Corporation. A preliminary version of this paper appeared at the Second Annual ACM Symposium on Parallel Algorithms and Architectures [3].  相似文献   

16.
Shellsort with a constant number of increments   总被引:2,自引:0,他引:2  
M. A. Weiss 《Algorithmica》1996,16(6):649-654
We consider the worst-case running time of Shellsort when only a constant number,c, of increments are allowed. Forc=3, we show that Shellsort can be implemented to run inO(N 5/3) time, which is optimal. Forc=4, we further improve the running time toO(N 11/7), and, forc=5, we obtain a bound ofO(N 23/15). We also show anO(N 1+1/k ) bound for generalc, wherek=[(1+8c+1)/4]. Forc=6, this isO(N 3/2).  相似文献   

17.
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(Nα), where 2<α3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using Nα/log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1pNα/log N, multiplying two N×N matrices can be performed by a DMPC with p processors in O(Nα/p) time, i.e., linear speedup and cost optimality can be achieved in the range [1..Nα/log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non- standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. For instance, for all 1p Nα /log N, multiplying two N×N matrices can be performed by p processors connected by a hypercubic network in O(Nα/p+(N2/p2/α)(log p)2(α−1)/α) time, which implies that if p=O(Nα/(log N)2(α−1)/(α−2)), linear speedup can be achieved. Such a parallelization is highly scalable. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.  相似文献   

18.
Efficient algorithms to compute the Hough transform on MIMD and SIMD hypercube multicomputer are developed. Our algorithms can compute p angles of the Hough transform of an N × N image, p N, in 0(p + log N) time on both MIMD and SIMD hypercubes. These algorithms require 0(N 2) processors. We also consider the computation of the Hough transform on MIMD hypercubes with a fixed number of processors. Experimental results on an NCUBE/7 hypercube are presented.This research was supported by the National Science Foundation under grants DCR84-20935 and 86-17374. All correspondence should be mailed to Sanjay Ranka.  相似文献   

19.
A cycleC passing through two specific verticess andt of a biconnected graph is said to be anst-ambitus if its bridges do not interlace in some special way. We present algorithms forst-ambitus for planar biconnected graphs, which are much simpler than the one known for general graphs [MT]. Our algorithm runs inO(n) time on a sequential machine and (logn) parallel time usingO(n/logn) processors on an EREW PRAM.  相似文献   

20.
Trees are a useful data type, but they are not routinely included in parallel programming systems, in part because their irregular structure makes partitioning and scheduling difficult. We present a method for algebraically constructing implementations of tree skeletons, high-level homomorphic operations that execute in parallel. Many computations on binary trees can be performed inO(logn) parallel time usingnprocessors, even taking account of communication costs. We extend these results to trees with arbitrary and variable degree. Then we show that it is possible to implement a distributed version of homomorphisms on binary trees, takingO(n/p+ log2p) parallel time onp < nprocessors, for trees of any skew and taking full account of communication costs. Under slightly stronger restrictions on the underlying functions, this can be improved toO(n/p+ logp). Furthermore, the technique for deriving distributed versions is algebraic, allowing the automatic generation of code for SPMD and data-parallel architectures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号