期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An optimal time bound for oblivious routing

Ian Parberry 《Algorithmica》1990,5(1):243-250

The problem of routing data packets in a constant-degree network is considered. A routing scheme is calledoblivious if the route taken by each packet is uniquely determined by its source and destination. The time required for the oblivious routing ofn packets onn processors is known to be (n). It is demonstrated that the presence of extra processors can expedite oblivious routing. More specifically, the time required for the oblivious routing ofn packets onp processors is (n/p + logn). 相似文献

2.

Packet routing on grids of processors

Manfred Kunde 《Algorithmica》1993,9(1):32-46

The problem of routing packets onn ₁×...×n _r mesh-connected arrays or grids of processors is studied. The focus of this paper is on permutation routing where each processor contains exactly one packet initially and finally. A slight modification of permutation routing called balanced routing is also discussed. For two-dimensional grids a determinisitc routing algorithm is given forn×n meshes where each processor has a buffer of size f(n) < n. It needs 2n + O(n/f(n)) steps on grids without wrap-arounds. Hence, it is asymptoticaliy nearly optimal, and as good as randomized algorithms routing data only with high probability. Furthermore, it is demonstrated that onr-dimensional cubes of processors permutation routing can be performed asymptotically by (2r–2)n steps, which is faster than the running times of so-far known randomized algorithms and of deterministic algorithms.Partially supported by Siemens AG, München. 相似文献

3.

Parallel construction of a suffix tree with applications

A. Apostolico C. Iliopoulos G. M. Landau B. Schieber U. Vishkin 《Algorithmica》1988,3(1-4):347-365

Many string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors. The algorithm requires Θ(n ²) space. However, the space needed can be reduced toO(n ^1+?) for any 0< ? ≤1, with a corresponding slow-down proportional to 1/?. Efficient parallel procedures are also given for some string problems that can be solved with suffix trees. 相似文献

4.

A bidirectional shortest-path algorithm with good average-case behavior

Michael Luby Prabhakar Ragde 《Algorithmica》1989,4(1-4):551-567

The two-terminal shortest-path problem asks for the shortest directed path from a specified nodes to a specified noded in a complete directed graphG onn nodes, where each edge has a nonnegative length. We show that if the length of each edge is chosen independently from the exponential distribution, and adjacency lists at each node are sorted by length, then a priority-queue implementation of Dijkstra's unidirectional search algorithm has the expected running time Θ(n logn). We present a bidirectional search algorithm that has expected running time Θ(√n logn). These results are generalized to apply to a wide class of edge-length distributions, and to sparse graphs. If adjacency lists are not sorted, bidirectional search has the expected running time Θ(a√n) on graphs of average degreea, as compared with Θ(an) for unidirectional search. 相似文献

5.

Concept of a multi-processor parallel processing unit

Prof. Dr. R. F. Albrecht 《Computing》1980,25(1):1-16

The logical concept of a processing system withm, 1<m≤n, identical or different processors is given where the processors operate onp, p≤n subprograms which are parts of a program for the system. As many of the processors as possible work in parallel, and as soon as a processor has terminated its process this processor is automatically turned over to another program to compute. 相似文献

6.

Parallel integer sorting and simulation amongst CRCW models

Sanjeev Saxena 《Acta Informatica》1996,33(5):607-619

In this paper a general technique for reducing processors in simulation without any increase in time is described. This results in an O(√logn) time algorithm for simulating one step of PRIORITY on TOLERANT with processor-time product of O(n log logn); the same as that for simulating PRIORITY on ARBITRARY. This is used to obtain anO(logn/log logn + √logn (log logm ? log logn)) time algorithm for sortingn integers from the set {0,...,m ? 1},m ≧n, with a processor-time product ofO(n log logm log logn) on a TOLERANT CRCW PRAM. New upper and lower bounds for ordered chaining problem on an allocated COMMON CRCW model are also obtained. The algorithm for ordered chaining takesO(logn/log logn) time on an allocated PRAM of sizen. It is shown that this result is best possible (upto a constant multiplicative factor) by obtaining a lower bound of Ω(r logn/(logr + log logn)) for finding the first (leftmost one) live processor on an allocated-COMMON PRAM of sizen ofr-slow virtual processors (one processor simulatesr processors of allocated PRAM). As a result, for ordered chaining problem, “processor-time product” has to be at least Ω(n logn/log logn) for any poly-logarithmic time algorithm. Algorithm for ordered-chaining problem results in anO(logN/log logN) time algorithm for (stable) sorting ofn integers from the set {0,...,m ? 1} withn-processors on a COMMON CRCW PRAM; hereN = max(n, m). In particular if,m =n ^O(1), then sorting takes Θ(logn/log logn) time on both TOLERANT and COMMON CRCW PRAMs. Processor-time product for TOLERANT isO(n(log logn)²). Algorithm for COMMON usesn processors. 相似文献

7.

Oblivious Routing Algorithms on the Mesh of Buses

《Journal of Parallel and Distributed Computing》2000,60(2):137-149

An optimal ⌈1.5N^1/2⌉ lower bound is shown for oblivious routing on the mesh of buses: a two-dimensional parallel model consisting of N^1/2×N^1/2 processors and N^1/2 row and N^1/2 column buses but no local connections between neighboring processors. Many lower bound proofs for routing on mesh-structured models use a single instance (adversary) which includes difficult packet-movement. This approach does not work in our case; our proof is one of the rare cases which really exploit the fact that the routing algorithm has to cope with many different instances. Note that the two-dimensional mesh of buses includes 2N^1/2 buses and each processor can access two different buses. Apparently the three-dimensional model provides more communication facilities, namely including 3N^2/3 buses, and each processor can access three different buses. Surprisingly, however, the oblivious routing on the three-dimensional mesh of buses needs more time, i.e., Ω(N^2/3) steps, which is another important result of this paper. 相似文献

8.

Scaleable Parallel Algorithms for Lower Envelopes with Applications

Laurence Boxer Russ Miller Andrew Rau-Chaplin 《Journal of Parallel and Distributed Computing》1998,53(2):1297

相似文献

9.

A minimum-area circuit forl-selection

Pavol Ďuriš Ondrej Sýkora Clark D. Thompson Imrich Vrto 《Algorithmica》1987,2(1-4):251-265

We prove tight upper and lower bounds on the area of semelective, when-oblivious VLSI circuits for the problem ofl-selection. The area required to select thelth smallest ofn k-bit integers is found to be heavily dependent on the relative sizes ofl,k, andn. Whenl<2^k, the minimal area isA = Θ(minn,l(k-logl)). Whenl≥2^k,A = Θ(2^k(logl-k + 1)). 相似文献

10.

A faster divide-and-conquer algorithm for constructing delaunay triangulations 总被引：2，自引：0，他引：2

Rex A. Dwyer 《Algorithmica》1987,2(1-4):137-151

An easily implemented modification to the divide-and-conquer algorithm for computing the Delaunay triangulation ofn sites in the plane is presented. The change reduces its Θ(n logn) expected running time toO(n log logn) for a large class of distributions that includes the uniform distribution in the unit square. Experimental evidence presented demonstrates that the modified algorithm performs very well forn≤2¹⁶, the range of the experiments. It is conjectured that the average number of edges it creates—a good measure of its efficiency—is no more than twice optimal forn less than seven trillion. The improvement is shown to extend to the computation of the Delaunay triangulation in theL _p metric for 1<p≤∞. 相似文献

11.

Constant-time hough transform on the processor arrays with reconfigurable bus systems

Dr. S. -S. Lin 《Computing》1994,52(1):1-15

We develop constant-time algorithms to compute the Hough transform on a processor array with a reconfigurable bus system (abbreviated to PARBS). The PARBS is a comptuation model which consists of a processor array and a reconfigurable bus system. It is a very powerful computation model in that many problems can be solved efficiently. In this paper, we introduce the concept of iterative-PARBS which is similar to the FOR-loop construct in sequential programming languages. The iterative-PARBS is a building block in which the processing data can be routed through it several times. We can think it as a “hardware subroutine”. Based on this scheme, we are able to explore constant-time Hough transform algorithms on PARBS. The following new results are derived in this study:

The sum ofn bits can be computed in O(1) times on a PARBS with O(n ^1+?) processors for any fixed ?>0.
The weights of each simple path of ann*n image can be computed in O(1) time on a 3-D PARBS with O(n ^2+?) processors for any fixed ?>0.
Thep angle Hough transform of ann*n image can be computed in O(1) time on a PARBS with O(p*n ^2+?) processors for any fixed ?>0 withp copies of the image pretiled.
Thep angle Hough transform of ann*n image can be computed in O(1) time on a PARBS with O(p*n ³) processors.

相似文献

12.

Potential Function Analysis of Greedy Hot-Potato Routing

A. Ben-Dor S. Halevi A. Schuster 《Theory of Computing Systems》1998,31(1):41-61

We study the problem of packet routing in synchronous networks. We put forward a notion of greedy hot-potato routing algorithms and devise tech- niques for analyzing such algorithms. A greedy hot-potato routing algorithm is one where: • The processors have no buffer space for storing delayed packets. Therefore, each packet must leave any intermediate processor at the step following its arrival. • Packets always advance toward their destination if they can. Namely, a packet must leave its current intermediate node via a link which takes it closer to its destination, unless all these links are taken by other packets. Moreover, in this case all these other packets must advance toward their destinations. We use potential function analysis to obtain an upper bound of O(n k ^1/2 ) on the running time of a wide class of algorithms in the two-dimensional n × n mesh, for routing problems with a total of k packets. The same techniques can be generalized to obtain an upper bound of O(exp(d) n^d-1 k ^1/d ) on the running time of a wide class of algorithms in the d -dimensional n ^d mesh, for routing problems with a total of k packets. Received December 1993, and in final form March 1997. 相似文献

13.

Adaptive Drift Analysis

Benjamin Doerr Leslie Ann Goldberg 《Algorithmica》2013,65(1):224-250

We show that, for any c>0, the (1+1) evolutionary algorithm using an arbitrary mutation rate p _n=c/n finds the optimum of a linear objective function over bit strings of length n in expected time Θ(nlogn). Previously, this was only known for c≤1. Since previous work also shows that universal drift functions cannot exist for c larger than a certain constant, we instead define drift functions which depend crucially on the relevant objective functions (and also on c itself). Using these carefully-constructed drift functions, we prove that the expected optimisation time is Θ(nlogn). By giving an alternative proof of the multiplicative drift theorem, we also show that our optimisation-time bound holds with high probability. 相似文献

14.

Optimal routing algorithms for mesh-connected processor arrays

Sanguthevar Rajasekaran Thanasis Tsantilas 《Algorithmica》1992,8(1-6):21-38

We show that there is a randomizedoblivious algorithm for routing any (partial) permutation on ann ×n grid in 2n +O(logn) parallel communication steps. The queues will not grow larger than Θ(logn/log logn) with high probability. We then modify this to obtain a (nonoblivious) algorithm with the same running time such that the size of the queues is bounded by a constant with high probability. For permutations withlocality, where each packet has to travel a distance at mostL, a generalization of the algorithm routes in time proportional toL with high probability. Finally, we identify a class of meshlike networks that have optimal or near-optimal diameter. These meshes have the potential of being adapted to run existing sorting and routing algorithms with corresponding reduction in their running times. 相似文献

15.

Bounds for nonpreemptive scheduling of jobs with similar processing times on multiprocessor systems using the LPT-Algorithm

H. Kellerer 《Computing》1991,46(3):183-191

The well-known, NP-complete problem of scheduling a set ofn independent jobs nonpreemptively onm identical parallel processors to minimize the maximum finish time is considered. Let ω₀ be the finish time of an optimal schedule and ω the finish time of a schedule found by the Longest Processing Time (LPT-)heuristic. We will improve the Graham-bound for the LPT-heuristic (ω/ω₀ ≤ 4/3 ? 1/3m) which is tight in general, by considering only jobs with similar processing times. 相似文献

16.

Randomized Priority Queues for Fast Parallel Access

《Journal of Parallel and Distributed Computing》1998,49(1):86-97

We present simple randomized algorithms for parallel priority queues on distributed memory machines. Inserting O(n) elements or deleting the O(n) out ofmsmallest elements usingnprocessors requires O(T_coll+ log(m/n)) amortized time with high probability whereT_collbounds the time for performing prefix sums and randomized routing. The memory requirement is bounded by (m/n)(1 +o(1)) + O(logn) whp. These bounds are an improvement over the best previously known algorithms for many interconnection networks and even matches the speed of the best known PRAM algorithms. Generalizations for accessing thek⪢nsmallest elements are even more efficient. A portable implementation using MPI demonstrates that our approach is already useful for medium scale parallelism. Two parallel selection algorithms for randomly placed data are a spin-off. One runs in time O(T_coll) with high probability, beating a lower bound for the worst case. The other requires only a single reduction operation. 相似文献

17.

Processor-time optimal parallel algorithms for digitized images on mesh-connected processor arrays

Hussein M. Alnuweiri V. K. Prasanna Kumar 《Algorithmica》1991,6(1-6):698-733

We present processor-time optimal parallel algorithms for several problems onn ×n digitized image arrays, on a mesh-connected array havingp processors and a memory of sizeO(n ²) words. The number of processorsp can vary over the range [1,n ^3/2] while providing optimal speedup for these problems. The class of image problems considered here includes labeling the connected components of an image; computing the convex hull, the diameter, and a smallest enclosing box of each component; and computing all closest neighbors. Such problems arise in medium-level vision and require global operations on image pixels. To achieve optimal performance, several efficient data-movement and reduction techniques are developed for the proposed organization. 相似文献

18.

Geometric applications of a matrix-searching algorithm

Alok Aggarwal Maria M. Klawe Shlomo Moran Peter Shor Robert Wilber 《Algorithmica》1987,2(1-4):195-208

LetA be a matrix with real entries and letj(i) be the index of the leftmost column containing the maximum value in rowi ofA.A is said to bemonotone ifi ₁ >i ₂ implies thatj(i ₁) ≥J(i ₂).A istotally monotone if all of its submatrices are monotone. We show that finding the maximum entry in each row of an arbitraryn xm monotone matrix requires Θ(m logn) time, whereas if the matrix is totally monotone the time is Θ(m) whenm≥n and is Θ(m(1 + log(n/m))) whenm<n. The problem of finding the maximum value within each row of a totally monotone matrix arises in several geometric algorithms such as the all-farthest-neighbors problem for the vertices of a convex polygon. Previously only the property of monotonicity, not total monotonicity, had been used within these algorithms. We use the Θ(m) bound on finding the maxima of wide totally monotone matrices to speed up these algorithms by a factor of logn. 相似文献

19.

An optimal visibility graph algorithm for triangulated simple polygons

John Hershberger 《Algorithmica》1989,4(1-4):141-155

LetP be a triangulated simple polygon withn sides. The visibility graph ofP has an edge between every pair of polygon vertices that can be connected by an open segment in the interior ofP. We describe an algorithm that finds the visibility graph ofP inO(m) time, wherem is the number of edges in the visibility graph. Becausem can be as small asO(n), the algorithm improves on the more general visibility algorithms of Asanoet al. [AAGHI] and Welzl [W], which take Θ(n ²) time, and on Suri'sO(m logn) visibility graph algorithm for simple polygons [S]. 相似文献

20.

Communication—Processor Tradeoffs in a Limited Resources PRAM

Adnan Agbaria Yosi Ben-Asher Ilan Newman 《Algorithmica》2002,34(3):276-297

We consider a simple restriction of the PRAM model (called PPRAM), where the input is arbitrarily partitioned between a fixed set of p processors and the shared memory is restricted to m cells. This model allows for investigation of the tradeoffs/ bottlenecks with respect to the communication bandwidth (modeled by the shared memory size m ) and the number of processors p . The model is quite simple and allows the design of optimal algorithms without losing the effect of communication bottlenecks. We have focused on the PPRAM complexity of problems that have $\tilde{O}$ (n) sequential solutions (where n is the input size), and where m ≤ p ≤ n . We show essentially tight time bounds (up to logarithmic factors) for several problems in this model such as summing, Boolean threshold, routing, integer sorting, list reversal and k -selection. We get typically two sorts of complexity behaviors for these problems: One type is $\tilde{O}$ (n/p + p/m) , which means that the time scales with the number of processors and with memory size (in appropriate ranges) but not with both. The other is $\tilde{O}$ (n/m) , which means that the running time does not scale with p and reflects a communication bottleneck (as long as m < p ). We are not aware of any problem whose complexity scales with both p and m (e.g. $O(n/\sqrt{m \cdot p})$ ). This might explain why in actual implementations one often fails to get p -scalability for p close to n . 相似文献