首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 906 毫秒
1.
In this paper, we present randomized algorithms for selection on the hypercube. We identify two variants of the hypercube, namely, thesequential modeland theparallel model. In the sequential model, any node at any time can handle only communication along a single incident edge, whereas in the parallel model a node can communicate along all its incident edges at the same time. We specify three variations of the parallel model and present optimal randomized algorithms on all these three versions of parallel model. In particular, we show that selection on an input of sizencan be performed on ap-node hypercube in timeO((n/p) + logp) with high probability, on any of the three versions of the parallel model. This result is important in view of a lower bound that implies that selection needs Ω((n/p)log logp+ logp) time on ap-node sequential hypercube. We modify our selection algorithm to run on the sequential hypercube in which case it runs in an expected time nearly matching this lower bound. For the special case whenn=p, our selection algorithm runs in an optimalO(logn) time on the sequential hypercube. Our algorithms are very simple and are most likely to perform well in practice.  相似文献   

2.
We present optimal embeddings of three genres of butterfly-like graphs in the (boolean) hypercube; each embedding is specified via a linear-time algorithm. Our first embedding finds an instance of the FFT graph as a subgraph of the smallest hypercube that is big enough to hold it; thus, we embed then-level FFT graph, which has (n+1)2 n vertices, in the (n+log2(n+1))-dimensional hypercube, with unit dilation. This embedding yields a mapping of the pipelined FFT algorithm on the hypercube architecture, which is optimal in all resources (time, processor utilization, load balancing, etc.) and which is on-line in the sense that inputs can be added to the transform even during the computation. Second, we find optimal embeddings of then-level butterfly graph and then-level cube-connected cycles graph, each of which hasn2 n vertices, in the (n+log2 n)-dimensional hypercube. These embeddings, too, have optimal dilation, congestion, and expansion. The dilation is 1+(n mod 2), which is best possible. Our embeddings indicate that these two bounded-degree approximations to the hypercube do not have any communication power that is not already present in the hypercube.The research of D. S. Greenberg was supported in part by NSF Grant MIP-86-01885. The research of L. S. Heath was supported in part by NSF Grant DCI-85-04308. The research of A. L. Rosenberg was supported in part by NSF Grants DCI-85-04308, DCI-87-96236, and CCR-88-12567.  相似文献   

3.
The main result of this paper is an algorithm for embeddingk-ary complete trees into hypercubes with uniform load and asymptotically optimal dilation. The algorithm is fully scalable—the dimension of the hypercube can be chosen independent of the arity and height of the tree. The embedding enables to emulate optimally both Divide&Conquer computations onk-ary complete trees where only one level of nodes is active at a time and general computations based onk-ary complete trees where all tree nodes are active simultaneously. As a special case, we get an algorithm for embedding thek-ary complete tree of given height into its optimal hypercube with load 1 and asymptotically optimal dilation.  相似文献   

4.
This paper presented a routing algorithm that finds n disjoint shortest paths from the source node s to target node d in the n-dimensional hypercube. Fault-tolerant routing over all shortest node-disjoint paths has been investigated to overcome the failure encountered during routing in hypercube networks. In this paper, we proposed an efficient approach to provide fault-tolerant routing which has been investigated on hypercube networks. The proposed approach is based on all shortest node-disjoint paths concept in order to find a fault-free shortest path among several paths provided. The proposed algorithm is a simple uniform distributed algorithm that can tolerate a large number of process failures, while delivering all n messages over optimal-length disjoint paths. However, no distributed algorithm uses acknowledgement messages (acks) for fault tolerance. So, for dealing the faults, acknowledgement messages (acks) are included in the proposed algorithm for routing messages over node-disjoint paths in a hypercube network.  相似文献   

5.
In this paper we present a simple parallel sorting algorithm and illustrate its application in general sorting, disk sorting, and hypercube sorting. The algorithm (called the (l,m) -mergesort (LMM)) is an extension of the bitonic and odd—even mergesorts. Literature on parallel sorting is abundant. Many of the algorithms proposed, though being theoretically important, may not perform satisfactorily in practice owing to large constants in their time bounds. The algorithm presented in this paper has the potential of being practical. We present an application to the parallel disk sorting problem. The algorithm is asymptotically optimal (assuming that N is a polynomial in M , where N is the number of records to be sorted and M is the internal memory size). The underlying constant is very small. This algorithm performs better than the disk-striped mergesort (DSM) algorithm when the number of disks is large. Our implementation is as simple as that of DSM (requiring no fancy data structures or prefetch techniques.) As a second application, we prove that we can get a sparse enumeration sort on the hypercube that is simpler than that of the classical algorithm of Nassimi and Sahni [16]. We also show that Leighton's columnsort algorithm is a special case of LMM. Online publication December 26, 2000.  相似文献   

6.
In this paper we describe anO(logN)-bit-step randomized algorithm for bit-serial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of on-line circuit switching in anO(1)-dilated hypercube (i.e., the problem of establishing edge-disjoint paths between the nodes of the dilated hypercube for any one-to-one mapping).Our algorithm is adaptive and we show that this is necessary to achieve the logarithmic speedup. We generalize the Borodin-Hopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithmic degree network requires at least (log2 N/log logN) bit steps with high probability for almost all permutations.This research was supported by the Defense Advanced Research Projects Agency under Contracts N00014-87-K-825 and N00014-89-J-1988, the Air Force under Contract AFOSR-89-0271, and the Army under Contract DAAL-03-86-K-0171. This work was completed while the third and fourth authors were at the Laboratory for Computer Science, Massachusetts Institute of Technology.  相似文献   

7.
8.
A distributed routing algorithm for faulty hypercubes is described. This algorithm uses a directed depth-first approach to find a path between the sender and receiver of a message whenever at least one non-faulty path exists. We show that, when an arbitrary number of elements of the hypercube can be faulty, the algorithm always routes messages using fewer than 2N hops, whereN is the number of nodes in the hypercube. This performance is shown to be within a factor of two of the optimal worst-case routing efficiency. Through foult simulations, we show that, even when up to half of the elements in the cube are faulty, complete the analysis, we prove that our algorithm is deadlock-free. Finally, we present two extensions of the algorithm. The first uses local storage to reduce the overhead of the algorithm while the second allows reliable broadcasting in the presence of an arbitrary number of faults.Supported in part by the National Science Foundation under Grant CCR-9010547.Supported in part by the National Science Foundation Instrumentation Grant CDA-8820627.  相似文献   

9.
We present a parallel algorithm for computing an optimal sequence alignment in efficient space. The algorithm is intended for a message-passing architecture with one-dimensional-array topology. The algorithm computes an optimal alignment of two sequences of lengthsM andN inO((M+N) 2 /P) time andO((M+N)/P) space per processor, where the number of processors isP>=max(M, N). Thus, whenP=max(M, N) it achieves linear speedup and requires constant space per processor. Some experimental results on an Intel hypercube are provided.This research was supported by NIH Grant LM05110 from the National Library of Medicine.  相似文献   

10.
To effectively utilize a faulty hypercube, it is often necessary to reconfigure the hypercube in such a way as to retain as many fault-free nodes as possible. This inspires us to identify maximalincomplete subcubesin a faulty hypercube, as the subcube so reconfigured is often much larger than any complete subcube obtainable and is likely to retain performance better. An efficient algorithm is first presented to find incomplete subcubes in a faulty hypercube. Three applications are then implemented on both an incomplete system and a complete one to measure their actual performance differences. Each application is mapped onto incomplete hypercubes, following the techniques developed for complete hypercubes (possibly with some modifications). Among the three applications,Gaussian eliminationtakes exactly the same mapping scheme on an incomplete system as on a complete one, whileFFTrequires some efforts to adapt it to the incomplete topology. The measured results of the three applications indicate that reconfiguring a faulty hypercube into an incomplete subcube is beneficial in practice.  相似文献   

11.
Ordering clones from a genomic library into physical maps of whole chromosomes presents a pivotal computational problem in genetics. Previous research has shown the physical mapping problem to be isomorphic to the NP-complete Optimal Linear Arrangement (OLA) problem for which no polynomial-time algorithm for determining the optimal solution is known. Serial implementations of stochastic global optimization techniques such as simulated annealing yielded very good results but proved computationally intensive. The design, analysis and implementation of coarse-grained parallel MIMD algorithms for simulated annealing on the Intel iPSC/860 hypercube is presented. Data decomposition and control decomposition strategies based on Markov chain decomposition, perturbation methods and problem-specific annealing heuristics are proposed and applied to the physical mapping problem. A suite of parallel algorithms are implemented on an 8-node Intel iPSC/860 hypercube, exploiting the nearest-neighbor communication pattern on the Boolean hypercube topology. Convergence, speedup and scalability characteristics of the various parallel algorithms are analyzed and discussed. Results indicate a deterioration of performance when a single Markov chain of solution states is distributed across multiple processing elements in the Intel iPSC/860 hypercube.  相似文献   

12.
We consider the problem of dynamically allocating and deallocating local memory resources among multiple users in a parallel or distributed system. Given a group of independent users and a collection of interconnected local memory devices, we want to render the fragmentation of the memory resources irrelevant by allowing any user to allocate space for his or her purposes as long as there is space available anywhere in the system. In effect, we would like it to appear to the users as though they are allocating memory from a single central pool of memory, even though the space is distributed throughout the system. Our goal is to devise an on-line allocation algorithm that minimizes two cost measures: first, the fraction of unused space , which arises due to fragmentation of the memory; second, the slowdown needed by the system to service user requests, which arises due to the contention for access to the memory devices. We solve this distributed dynamic allocation problem in near-optimal fashion by devising an algorithm that allows the memory to be used to 100% of capacity despite the fragmentation and guarantees that service delays will always be within a constant factor of optimal. The algorithm is completely on-line (no foreknowledge of user activity is assumed) and can accommodate any sequence of allocations and deallocations by the users that does not violate global memory bounds. We also consider the distributed dynamic allocation problem in the more restrictive setting where the local memory devices are connected by a low-degree fixed-connection network, rather than being fully interconnected. In this case, communication costs must be more explicitly considered in our allocation algorithms. We give allocation algorithms for butterfly and hypercube networks, and prove necessary and sufficient conditions on the total amount of memory space needed for near-optimal algorithms to exist. Received November 5, 1996; revised December 10, 1997.  相似文献   

13.
We develop a parallel algorithm for partitioning the vertices of a graph intop2 sets in such a way that few edges connect vertices in different sets. The algorithm is intended for a message-passing multiprocessor system, such as the hypercube, and is based on the Kernighan-Lin algorithm for finding small edge separators on a single processor.(1) We use this parallel partitioning algorithm to find orderings for factoring large sparse symmetric positive definite matrices. These orderings not only reduce fill, but also result in good processor utilization and low communication overhead during the factorization. We provide a complexity analysis of the algorithm, as well as some numerical results from an Intel hypercube and a hypercube simulator.Publication of this report was partially supported by the National Science Foundation under Grant DCR-8451385 and by AT&T Bell Laboratories through their Ph.D scholarship program.  相似文献   

14.
Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. In this paper we present parallel algorithms for computing maximum cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models. For the general case of the problem, our algorithms compute a maximum matching in O( log 3 n) time using O(n/ log 2 n) processors on the EREW PRAM and using n processors on the hypercubes. For the case of proper interval graphs, our algorithm runs in O( log n ) time using O(n) processors if the input intervals are not given already sorted and using O(n/ log n ) processors otherwise, on the EREW PRAM. On n -processor hypercubes, our algorithm for the proper interval case takes O( log n log log n ) time for unsorted input and O( log n ) time for sorted input. Our parallel results also lead to optimal sequential algorithms for computing maximum matchings among disjoint intervals. In addition, we present an improved parallel algorithm for maximum matching between overlapping intervals in proper interval graphs. Received November 20, 1995; revised September 3, 1998.  相似文献   

15.
This paper proposes a simple yet efficient algorithm to distribute loads evenly on multiprocessor computers with hypercube interconnection networks. This algorithm was developed based upon the well-known dimension exchange method. However, the error accumulation suffered by other algorithms based on the dimension exchange method is avoided by exploiting the notion of regular distributions, which are commonly deployed for data distributions in parallel programming. This algorithm achieves a perfect load balance over P processors with an error of 1 and the worst-case time complexity of O(M log2 P), where M is the maximum number of tasks initially assigned to each processor. Furthermore, perfect load balance is achieved over subcubes as well—once a hypercube is balanced, if the cube is decomposed into two subcubes by the lowest bit of node addresses, then the difference between the numbers of the total tasks of these subcubes is at most 1.  相似文献   

16.
In this paper, an algorithm is introduced that computes an arbitrarily fine approximation of the smallest enclosing ball of a point set in any dimension. This operation is important in, for example, classification, clustering, and data mining. The algorithm is very simple to implement, gives reliable results, and gracefully handles large problem instances in low and high dimensions, as confirmed by both theoretical arguments and empirical evaluation. For example, using a CPU with eight cores, it takes less than two seconds to compute a 1.001‐approximation of the smallest enclosing ball of one million points uniformly distributed in a hypercube in dimension 200. Furthermore, the presented approach extends to a more general class of input objects, such as ball sets.  相似文献   

17.
The n-dimensional twisted cube, denoted by TQ n , a variation of the hypercube, possesses some properties superior to the hypercube. In this paper, we show that every vertex in TQ n lies on a fault-free cycle of every length from 6 to 2 n , even if there are up to n?2 link faults. We also show that our result is optimal.  相似文献   

18.
We propose a new, low-cost fault-tolerant structure for the hypercube that employs spare processors and extra links. The target of the proposed structure is to fully tolerate the first faulty node, no matter where it occurs, and almost fully tolerate the second, meaning that the underlying hypercube topology can be resumed if the second faulty node occurs at most locations—expectantly 92% of locations. The unique features of our structure are that (1) it utilizes the unused extra link-ports in the processor nodes of the hypercube to obtain the proposed topology, so that minimum extra hardware is needed in constructing the fault-tolerant structure and (2) the structure's node-degrees are low as desired—the primary and spare nodes all have node-degrees of n + 2 for an n-dimensional hypercube. The number of spare nodes is one fourth of primary nodes. The reconfiguration algorithm in the presence of faults is elegant and efficient. The proposed structure also effectively enhances the diagnosability of the hypercube system. It is shown that the diagnosability of the structure is increased to n + 2, whereas an ordinary n-dimensional hypercube has diagnosability n.  相似文献   

19.
Abstract. We present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n nonintersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time with high probability using O(n) processors on a CRCW PRAM. This algorithm is optimal in terms of work done since the sequential time bound for this problem is Ω(n log n) . Our algorithm improves by an O(log n) factor the previously best known deterministic parallel algorithm, given by Goodrich, ó'Dúnlaing, and Yap, which runs in O( log 2 n) time using O(n) processors. We obtain this result by using a new ``two-stage' random sampling technique. By choosing large samples in the first stage of the algorithm, we avoid the hurdle of problem-size ``blow-up' that is typical in recursive parallel geometric algorithms. We combine the two-stage sampling technique with efficient search and merge procedures to obtain an optimal algorithm. This technique gives an alternative optimal algorithm for the Voronoi diagram of points as well (all other optimal parallel algorithms for this problem use the transformation to three-dimensional half-space intersection).  相似文献   

20.
We study the problem of minimizing the expected cost of binary searching for data where the access cost is not fixed and depends on the last accessed element, such as data stored in magnetic or optical disk. We present an optimal algorithm for this problem that finds the optimal search strategy in O(n 3 ) time, which is the same time complexity of the simpler classical problem of fixed costs. Next, we present two practical linear expected time algorithms, under the assumption that the access cost of an element is independent of its physical position. Both practical algorithms are online, that is, they find the next element to access as the search proceeds. The first one is an approximate algorithm which minimizes the access cost disregarding the goodness of the problem partitioning. The second one is a heuristic algorithm, whose quality depends on its ability to estimate the final search cost, and therefore it can be tuned by recording statistics of previous runs. We present an application for our algorithms related to text retrieval. When a text collection is large it demands specialized indexing techniques for efficient access. One important type of index is the suffix array, where data access is provided through an indirect binary search on the text stored in magnetic disk or optical disk. Under this cost model we prove that the optimal algorithm cannot perform better than Ω(1/ log n) times the standard binary search. We also prove that the approximate strategy cannot, on average, perform worse than 39% over the optimal one. We confirm the analytical results with simulations, showing improvements between 34% (optimal) and 60% (online) over standard binary search for both magnetic and optical disks. Received February 13, 1997; revised May 27, 1998.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号