期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Distribution-Independent Hierarchical Algorithms for the N-body Problem

Aluru Srinivas Gustafson John Prabhu G.M. Sevilgen Fatih E. 《The Journal of supercomputing》1998,12(4):303-323

The N-body problem is to simulate the motion of N particles under the influence of mutual force fields based on an inverse square law. Greengards algorithm claims to compute the cumulative force on each particle in O(N) time for a fixed precision irrespective of the distribution of the particles. In this paper, we show that Greengards algorithm is distribution dependent and has a lower bound of (N log 2 N) in two dimensions and (N log 4 N) in three dimensions. We analyze the Greengard and Barnes-Hut algorithms and show that they are unbounded for arbitrary distributions. We also present a truly distribution independent algorithm for the N-body problem that runs in O(N log N) time for any fixed dimension. 相似文献

2.

An Optimal Parallel Co-Connectivity Algorithm

Ka Wong Chong Stavros D. Nikolopoulos Leonidas Palios 《Theory of Computing Systems》2004,37(4):527-546

In this paper we consider the problem of computing the connected components of the complement of a given graph. We describe a simple sequential algorithm for this problem, which works on the input graph and not on its complement, and which for a graph on n vertices and m edges runs in optimal O(n+m) time. Moreover, unlike previous linear co-connectivity algorithms, this algorithm admits efficient parallelization, leading to an optimal O(log n)-time and O((n+m)log n)-processor algorithm on the EREW PRAM model of computation. It is worth noting that, for the related problem of computing the connected components of a graph, no optimal deterministic parallel algorithm is currently available. The co-connectivity algorithms find applications in a number of problems. In fact, we also include a parallel recognition algorithm for weakly triangulated graphs, which takes advantage of the parallel co-connectivity algorithm and achieves an O(log² n) time complexity using O((n+m²) log n) processors on the EREW PRAM model of computation. 相似文献

3.

Work-time optimal k-merge algorithms on the PRAM

Hayashi T. Nakano K. Olariu S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(3):275-282

For 2⩽k⩽n, the k-merge problem is to merge a collection of ksorted sequences of total length n into a new sorted sequence. The k-merge problem is fundamental as it provides a common generalization of both merging and sorting. The main contribution of this work is to give simple and intuitive work-time optimal algorithms for the k-merge problem on three PRAM models, thus settling the status of the k-merge problem. We first prove that Ω(n log k) work is required to solve the k-merge problem on the PRAM models. We then show that the EREW-PRAM and both the CREW-PRAM and the CRCW require Ω(log n) time and Ω(log log n+log k) time, respectively, provided that the amount of work is bounded by O(n log k). Our first k-merge algorithm runs in Θ(log n) time and performs Θ(n log k) work on the EREW-PRAM. Finally, we design a work-time optimal CREW-PRAM k-merge algorithm that runs in Θ(log log n+log k) time and performs Θ(n log k) work. This latter algorithm is also work-time optimal on the CREW-PRAM model. Our algorithms completely settle the status of the k-merge problem on the three main PRAM models 相似文献

4.

Optimal parallel algorithms for finding proximate points, withapplications

Hayashi T. Nakano K. Olariu S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(12):1153-1166

Consider a set P of points in the plane sorted by the x-coordinate. A point p in P is said to be a proximate point if there exists a point q on the x-axis such that p is the closest point to q over all points in P. The proximate point problem is to determine all the proximate points in P. Our main contribution is to propose optimal parallel algorithms for solving instances of size n of the proximate points problem. We begin by developing a work-time optimal algorithm running in O(log log n) time and using n/loglogn Common-CRCW processors. We then go on to show that this algorithm can be implemented to run in O(log n) time using n/logn EREW processors. In addition to being work-time optimal, our EREW algorithm turns out to also be time-optimal. Our second main contribution is to show that the proximate points problem finds interesting, and quite unexpected, applications to digital geometry and image processing. As a first application, we present a work-time optimal parallel algorithm for finding the convex hull of a set of n points in the plane sorted by x-coordinate; this algorithm runs in O(log log n) time using n/logn Common-CRCW processors. We then show that this algorithm can be implemented to run in O(log n) time using n/logn EREW processors. Next, we show that the proximate points algorithms afford us work-time optimal (resp, time-optimal) parallel algorithms for various fundamental digital geometry and image processing problems 相似文献

5.

Optimal Parallel Randomized Algorithms for the Voronoi Diagram of Line Segments in the Plane

Rajasekaran Ramaswami 《Algorithmica》2002,33(4):436-460

Abstract. We present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n nonintersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time with high probability using O(n) processors on a CRCW PRAM. This algorithm is optimal in terms of work done since the sequential time bound for this problem is Ω(n log n) . Our algorithm improves by an O(log n) factor the previously best known deterministic parallel algorithm, given by Goodrich, ó'Dúnlaing, and Yap, which runs in O( log ² n) time using O(n) processors. We obtain this result by using a new ``two-stage' random sampling technique. By choosing large samples in the first stage of the algorithm, we avoid the hurdle of problem-size ``blow-up' that is typical in recursive parallel geometric algorithms. We combine the two-stage sampling technique with efficient search and merge procedures to obtain an optimal algorithm. This technique gives an alternative optimal algorithm for the Voronoi diagram of points as well (all other optimal parallel algorithms for this problem use the transformation to three-dimensional half-space intersection). 相似文献

6.

An improved constant-time algorithm for computing the Radon andHough transforms on a reconfigurable mesh

Yi Pan Keqin Li Hamdi M. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》1999,29(4):417-421

The Hough transform is an important problem in image processing and computer vision. An efficient algorithm for computing the Hough transform has been proposed on a reconfigurable array by Kao et al. (1995). For a problem with an √N×√N image and an n×n parameter space, the algorithm runs in a constant time on a three-dimensional (3-D) n×n×N reconfigurable mesh where the data bus is N¹c/-bit wide. To our best knowledge, this is the most efficient constant-time algorithm for computing the Hough transform on a reconfigurable mesh. In this paper, an improved Hough transform algorithm on a reconfigurable mesh is proposed. For the same problem, our algorithm runs in constant time on a 3-D n*n×n×√n√n reconfigurable mesh, where the data bus is only log N-bit wide. In most practical situations, n=O(√N). Hence, our algorithm requires much less VLSI area to accomplish the same task. In addition, our algorithm can compute the Radon transform (a generalized Hough transform) in O(1) time on the same model, whereas the algorithm in the above paper cannot be adapted to computing Radon transform easily 相似文献

7.

A fast algorithm for computing a histogram on reconfigurable mesh 总被引：1，自引：0，他引：1

Ju-Wook Jang Heonchul Park Prasanna V.K. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(2):97-106

The reconfigurable mesh captures salient features from a variety of sources, including the content addressable array parallel processor, the CHiP, the polymorphic-torus network and the bus automaton. It consists of an array of processors interconnected by a reconfigurable bus system. The bus system can be used to dynamically obtain various interconnection patterns between the processors. In this paper, we present a fast algorithm for computing the histogram of an N×N image with h grey levels in O(min{√h+log*(N/h),N}) time on an N×N reconfigurable mesh assuming each PE has a constant amount of local memory. This algorithm runs on the PARBUS and MRN/LRN models. In addition, histogram modification can be performed in O(√h) time on the same model. A variant of out algorithm runs in O(min{√h+log log(N/h),N}) time on an N×N RMESH in which each PE has constant storage. This result improves the known time and memory bounds for histogramming on the RMESH model 相似文献

8.

Massively parallel algorithms for trace-driven cache simulations

Nicol D.M. Greenberg A.G. Lubachevsky B.D. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(8):849-859

Considers the use of massively parallel architectures to execute a trace-driven simulation of a single cache set. A method is presented for the least-recently-used (LRU) policy, which, regardless of the set size C, runs in time O(log N) using N processors on the EREW (exclusive read, exclusive write) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. We present timings of this algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference-based line replacement policies are considered, which includes LRU as well as the least-frequently-used (LFU) and random replacement policies. A simulation method is presented for any such policy that, on any trace of length N directed to a C line set, runs in O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation 相似文献

9.

Fast parallel algorithm for distance transform

Datta A. Soundaralakshmi S. 《IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society》2003,33(4):429-434

We present an O((log log N)/sup 2/) -time algorithm for computing the distance transform of an N /spl times/ N binary image. Our algorithm is designed for the common concurrent read concurrent write parallel random access machine (CRCW PRAM) and requires O(N/sup 2+/spl epsi///log log N) processors, for any /spl epsi/ such that 0 < /spl epsi/ < 1. Our algorithm is based on a novel deterministic sampling scheme and can be used for computing distance transforms for a very general class of distance functions. We also present a scalable version of our algorithm when the number of processors is available p/sup 2+/spl epsi///log log p for some p < N. In this case, our algorithm runs in O((N/sup 2//p/sup 2/)+(N/p) log log p + (log log p)/sup 2/) time. This scalable algorithm is more practical since usually the number of available processors is much less than the size of the image. 相似文献

10.

An efficient algorithm for row minima computations on basicreconfigurable meshes

Nakano K. Olariu S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(6):561-569

A matrix A of size m×n containing items from a totally ordered universe is termed monotone if, for every i, j, 1⩽i2. In case m=n^ϵ for some constant ϵ, (0<ϵ⩽1), our algorithm runs in O(log log n) time 相似文献

11.

Optimal algorithms for the channel-assignment problem on a reconfigurable array of processors with wider bus networks

Shi-Jinn Horng Horng-Ren Tsai Yi Pan Seitzer J. 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(11):1124-1138

The computation model on which the algorithms are developed is the reconfigurable array of processors with wider bus networks (abbreviated to RAPWBN). The main difference between the RAPWBN model and other existing reconfigurable parallel processing systems is that the bus width of each network is bounded within the range [2,[/spl radic/(N)]]. Such a strategy not only saves the silicon area of the chip as well as increases the computational power enormously, but the strategy also allows the execution speed of the proposed algorithms to be tuned by the bus bandwidth. To demonstrate the computational power of the RAPWBN, the channel-assignment problem is derived in this paper. For the channel-assignment problem with N pairs of components, we first design an O(T + [N//spl omega/]) time parallel algorithm using 2N processors with a 2N-row by 2N-column bus network, where the bus width of each bus network is /spl omega/-bit for 2 /spl les/ /spl omega/ /spl les/ [/spl radic/N] and T = [log/sub /spl omega//N] + 1. By tuning the bus bandwidth to the natural log N-bit and the extended N/sup 1/c/-bit (N/sup 1/c/ > log N) for any constant c and c /spl ges/ 1, two more results which run in O(log N/log log N) and O(1) time, respectively, are also derived. When compared to the algorithms proposed by Olariu et al. [17] and Lin [14], it is shown that our algorithm runs in the equivalent time complexity while significantly reducing the number of processors to O(N). 相似文献

12.

Parallel dynamic programming

Huang S.-H.S. Hongfei Liu Viswanathan V. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):326-328

Recurrence formulations for various problems, such as finding an optimal order of matrix multiplication, finding an optimal binary search tree, and optimal triangulation of polygons, assume a similar form. A. Gibbons and W. Rytter (1988) gave a CREW PRAM algorithm to solve such dynamic programming problems. The algorithm uses O(n⁶/log n) processors and runs in O(log² n) time. In this article, a modified algorithm is presented that reduces the processor requirement to O(n⁶/log ⁵n) while maintaining the same time complexity of O(log² n) 相似文献

13.

Deterministic Communication in Radio Networks with Large Labels

Leszek Gasieniec Aris Pagourtzis Igor Potapov Tomasz Radzik 《Algorithmica》2007,47(1):97-117

We study deterministic gossiping in ad hoc radio networks with large node labels. The labels (identifiers) of the nodes come from a domain of size N which may be much larger than the size n of the network (the number of nodes). Most of the work on deterministic communication has been done for the model with small labels which assumes N = O(n). A notable exception is Peleg's paper, where the problem of deterministic communication in ad hoc radio networks with large labels is raised and a deterministic broadcasting algorithm is proposed, which runs in O(n²log n) time for N polynomially large in n. The O(nlog²n)-time deterministic broadcasting algorithm for networks with small labels given by Chrobak et al. implies deterministic O(n log N log n)-time broadcasting and O(n²log²N log n)-time gossiping in networks with large labels. We propose two new deterministic gossiping algorithms for ad hoc radio networks with large labels, which are the first such algorithms with subquadratic time for polynomially large N. More specifically, we propose: a deterministic O(n^3/2log²N log n)-time gossiping algorithm for directed networks; and a deterministic O(n log²N log²n)-time gossiping algorithm for undirected networks. 相似文献

14.

Multiway merging in parallel

Zhaofang Wen 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(1):11-17

The problem of merging k (k⩾2) sorted lists is considered. We give an optimal parallel algorithm which takes O((n log k/p)+log n) time using p processors on a parallel random access machine that allows concurrent reads and exclusive writes, where n is the total size of the input lists. This algorithm achieves O(log n) time using p=n log k/log n processors. Most of the previous log n research for this problem has been focused on the case when k=2. Very recently, parallel solutions for the case when k=2 have been reported. Our solution is the first logarithmic time optimal parallel algorithm for the problem when k⩾2. It can also be seen as a unified optimal parallel algorithm for sorting and merging. In order to support the algorithm, a new processor assignment strategy is also presented 相似文献

15.

Book Embeddability of Series–Parallel Digraphs

Emilio Di Giacomo Walter Didimo Giuseppe Liotta Stephen K. Wismath 《Algorithmica》2006,45(4):531-547

In this paper we deal with the problem of computing upward two-page book embeddings of Two Terminal Series-Parallel (TTSP) digraphs, which are a subclass of series-parallel digraphs. An optimal O(n) time and space algorithm to compute an upward two-page book embedding of a TTSP-digraph with n vertices is presented. A previous algorithm of Alzohairi and Rival [1] runs in O(n³) time and assumes that the input series-parallel digraph does not have transitive edges. An application of this result to a computational geometry problem is also discussed. More precisely, upward two-page book embeddings are used to deal with the upward point-set embeddability problem, i.e., the problem of mapping planar digraphs onto a given set of points in the plane so that all edges are monotonically increasing in a common direction. The equivalence between upward two-page book embeddability and upward point-set embeddability with at most one bend per edge on any given set of points is proved. An O(n log n)-time algorithm for computing an upward point-set embedding with at most one bend per edge for TTSP-digraphs is presented. 相似文献

16.

Distortion-Free Steganography for Polygonal Meshes

Alexander Bogomjakov Craig Gotsman Martin Isenburg 《Computer Graphics Forum》2008,27(2):637-642

We present a technique for steganography in polygonal meshes. Our method hides a message in the indexed rep‐resentation of a mesh by permuting the order in which faces and vertices are stored. The permutation is relative to a reference ordering that encoder and decoder derive from the mesh connectivity in a consistent manner. Our method is distortion‐free because it does not modify the geometry of the mesh. Compared to previous steganographic methods for polygonal meshes our capacity is up to an order of magnitude better. Our steganography algorithm is universal and can be used instead of the standard permutation steganography algorithm on arbitrary datasets. The standard algorithm runs in Ω (n² log² n log log n) time and achieves optimal O(nlog n) bit capacity on datasets with n elements. In contrast, our algorithm runs in O(n) time, achieves a capacity that is only one bit per element less than optimal, and is extremely simple to implement. 相似文献

17.

Efficient algorithms for globally optimal trajectories 总被引：3，自引：0，他引：3

Tsitsiklis J.N. 《Automatic Control, IEEE Transactions on》1995,40(9):1528-1538

We present serial and parallel algorithms for solving a system of equations that arises from the discretization of the Hamilton-Jacobi equation associated to a trajectory optimization problem of the following type. A vehicle starts at a prespecified point x_o and follows a unit speed trajectory x(t) inside a region in ℛ^m until an unspecified time T that the region is exited. A trajectory minimizing a cost function of the form ∫₀^T r(x(t))dt+q(x(T)) is sought. The discretized Hamilton-Jacobi equation corresponding to this problem is usually solved using iterative methods. Nevertheless, assuming that the function r is positive, we are able to exploit the problem structure and develop one-pass algorithms for the discretized problem. The first algorithm resembles Dijkstra's shortest path algorithm and runs in time O(n log n), where n is the number of grid points. The second algorithm uses a somewhat different discretization and borrows some ideas from a variation of Dial's shortest path algorithm (1969) that we develop here; it runs in time O(n), which is the best possible, under some fairly mild assumptions. Finally, we show that the latter algorithm can be efficiently parallelized: for two-dimensional problems and with p processors, its running time becomes O(n/p), provided that p=O(√n/log n) 相似文献

18.

Optimal Sublogarithmic Time Parallel Algorithms on Rooted Forests

G. Sajith S. Saxena 《Algorithmica》2000,27(2):187-197

The problem of finding a sublogarithmic time optimal parallel algorithm for 3 -colouring rooted forests has been open for long. We settle this problem by obtaining an O(( log log n) log^* ( log^* n)) time optimal parallel algorithm on a TOLERANT Concurrent Read Concurrent Write (CRCW) Parallel Random Access Machine (PRAM). Furthermore, we show that if f(n) is the running time of the best known algorithm for 3 -colouring a rooted forest on a COMMON or TOLERANT CRCW PRAM, a fractional independent set of the rooted forest can be found in O(f(n)) time with the same number of processors, on the same model. Using these results, it is shown that decomposable top-down algebraic computation and, hence, depth computation (ranking), 2 -colouring and prefix summation on rooted forests can be done in O( log n) optimal time on a TOLERANT CRCW PRAM. These algorithms have been obtained by proving a result of independent interest, one concerning the self-simulation property of TOLERANT: an N -processor TOLERANT CRCW PRAM that uses an address space of size O(N) only, can be simulated on an n -processor TOLERANT PRAM in O(N/n) time, with no asymptotic increase in space or cost, when n=O(N/ log log N) . Received May 20, 1997; revised June 15, 1998. 相似文献

19.

Parallel algorithms for relational coarsest partition problems 总被引：2，自引：0，他引：2

Rajasekaran S. Lee I. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(7):687-699

Relational Coarsest Partition Problems (RCPPs) play a vital role in verifying concurrent systems. It is known that RCPPs are P-complete and hence it may not be possible to design polylog time parallel algorithms for these problems. In this paper, we present two efficient parallel algorithms for RCPP in which its associated label transition system is assumed to have m transitions and n states. The first algorithm runs in O(n^1+ϵ) time using m/n^ϵ CREW PRAM processors, for any fixed ϵ<1. This algorithm is analogous to and optimal with respect to the sequential algorithm of P.C. Kanellakis and S.A. Smolka (1990). The second algorithm runs in O(n log n) time using m/n CREW PRAM processors. This algorithm is analogous to and nearly optimal with respect to the sequential algorithm of R. Paige and R.E. Tarjan (1987) 相似文献

20.

An O(2<Superscript>O(k)</Superscript>n<Superscript>3</Superscript>) FPT Algorithm for the Undirected Feedback Vertex Set Problem

Frank Dehne Michael Fellows Michael Langston Frances Rosamond Kim Stevens 《Theory of Computing Systems》2007,41(3):479-492

We describe an algorithm for the Feedback Vertex Set problem on undirected graphs, parameterized by the size k of the feedback vertex set, that runs in time O(c^kn³) where c = 10.567 and n is the number of vertices in the graph. The best previous algorithms were based on the method of bounded search trees, branching on short cycles. The best previous running time of an FPT algorithm for this problem, due to Raman, Saurabh and Subramanian, has a parameter function of the form 2^{O(k log k /log log k)}. Whether an exponentially linear in k FPT algorithm for this problem is possible has been previously noted as a significant challenge. Our algorithm is based on the new FPT technique of iterative compression. Our result holds for a more general form of the problem, where a subset of the vertices may be marked as forbidden to belong to the feedback set. We also establish "exponential optimality" for our algorithm by proving that no FPT algorithm with a parameter function of the form O(2^o(k)) is possible, unless there is an unlikely collapse of parameterized complexity classes, namely FPT = M[1]. 相似文献