首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Our approach combines the method of inexact steepest descent with the method of contractor directions to obtain an algorithm for solving systems of linear equations. In order to enhance the scope of applicability, we consider an iterative method with variable step-size iterations. We prove the convergence and given an error estimate for our method.

The algorithm is well-suited for parallel computation. In fact, for systems with m equations and n unknowns, each iteration may be computed in parallel time O(log m + log n), on an EREW PRAM with O(mn) processors.  相似文献   


2.
We show that the notoriously difficult problem of finding and reporting the smallest number of vertex-disjoint paths that cover the vertices of a graph can be solved time- and work-optimally for cographs. Our result implies that for this class of graphs the task of finding a Hamiltonian path can be solved time- and work-optimally in parallel.

It was open for more than 10 years to find a time- and work-optimal parallel solution for this important problem. Our contribution is to offer an optimal solution to this important problem. We begin by showing that any algorithm that solves an instance of size n of the problem must take Ω(log n) time on the CREW, even if an infinite number of processors are available. We then go on to show that this time lower bound is tight by devising an EREW algorithm that, given an n-vertex cograph G represented by its cotree, finds and reports all the paths in a minimum path cover in O(log n) time using n/log n processors.  相似文献   


3.
We substantially improve the known algorithms for approximating all the complex zeros of an nth degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2b, by using order of n arithmetic operations and order of (b + n)n2 Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n2 Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)).  相似文献   

4.
We previously proved that almost all words of length n over a finite alphabet A with m letters contain as factors all words of length k(n) over A as n→∞, provided limsupn→∞ k(n)/log n<1/log m.

In this note it is shown that if this condition holds, then the number of occurrences of any word of length k(n) as a factor into almost all words of length n is at least s(n), where limn→∞ log s(n)/log n=0. In particular, this number of occurrences is bounded below by C log n as n→∞, for any absolute constant C>0.  相似文献   


5.
Given n points in the plane the planar dominance counting problem is to determine for each point the number of points dominated by it. Point p is said to dominate point q if x(q)x(p) and y(q)y(p), when x(p) and y(p) are the x− and y-coordinate of p, respectively. We present two CREW PRAM parallel algorithms for the problem, one running in O(log n loglog n) time and and the other in O(lognloglogn/logloglogn) time both using O(n) processors. Some applicationsare also given.  相似文献   

6.
ANTS: Agents on Networks, Trees, and Subgraphs   总被引:1,自引:0,他引:1  
Efficient exploration of large networks is a central issue in data mining and network maintenance applications. In most existing work there is a distinction between the active ‘searcher’ which both executes the algorithm and holds the memory and the passive ‘searched graph’ over which the searcher has no control at all. Large dynamic networks like the Internet, where the nodes are powerful computers and the links have narrow bandwidth and are heavily-loaded, call for a different paradigm, in which a noncentralized group of one or more lightweight autonomous agents traverse the network in a completely distributed and parallelizable way. Potential advantages of such a paradigm would be fault tolerance against network and agent failures, and reduced load on the busy nodes due to the small amount of memory and computing resources required by the agent in each node. Algorithms for network covering based on this paradigm could be used in today’s Internet as a support for data mining and network control algorithms. Recently, a vertex ant walk ( ) method has been suggested [I.A. Wagner, M. Lindenbaum, A.M. Bruckstein, Ann. Math. Artificial Intelligence 24 (1998) 211–223] for searching an undirected, connected graph by an a(ge)nt that walks along the edges of the graph, occasionally leaving ‘pheromone’ traces at nodes, and using those traces to guide its exploration. It was shown there that the ant can cover a static graph within time nd, where n is the number of vertices and d the diameter of the graph. In this work we further investigate the performance of the method on dynamic graphs, where edges may appear or disappear during the search process. In particular we prove that (a) if a certain spanning subgraph S is stable during the period of covering, then the method is guaranteed to cover the graph within time nds, where ds is the diameter of S, and (b) if a failure occurs on each edge with probability p, then the expected cover time is bounded from above by nd((logΔ/log(1/p))+((1+p)/(1−p))), where Δ is the maximum vertex degree in the graph. We also show that (c) if G is a static tree then it is covered within time 2n.  相似文献   

7.
《Parallel Computing》1990,15(1-3):133-145
This paper describes a parallel algorithm for the LU decomposition of band matrices using Gaussian elimination. The matrix dimension is n × n with 2r−1 diagonals. In the case when 1 r 2 p an optimal number of the processors, , is determined according to the equation . When 2 p r n a number of processors, p, statged by Veldhorst is adopted (see [7]). For band matrix with 2r-1 diagonals (1 r 2p) the task scheduling procedure with the aim to obtain maximal parallelism in system operation, i.e. good load balancing, is defined. The architecture of the system is of MIMD type. The connection between the processors is realised via a common bus. Communication and synchronization is performed by message passing technique.  相似文献   

8.
This paper presents a new practical bit-vector algorithm for solving the well-known Longest Common Subsequence (LCS) problem. Given two strings of length m and n, nm, we present an algorithm which determines the length p of an LCS in O(nm/w) time and O(m/w) space, where w is the number of bits in a machine word. This algorithm can be thought of as column-wise “parallelization” of the classical dynamic programming approach. Our algorithm is very efficient in practice, where computing the length of an LCS of two strings can be done in linear time and constant (additional/working) space by assuming that mw.  相似文献   

9.
A new family of network topologies containing multiple loops is discussed in this paper. In the proposed structure, N processors are interconnected to form a graph G(m, N), m 3, where m is a parameter of the graph such that N is an even multiple of m and (m − 1) × 2[(m− l)/2]+ < N m × 2[m/2]+1. The graph G(m, N) is hamiltonian with an average node degree (3 + l/m), when m is even and exactly 3 when m is odd. Whereas, the maximum node degree is 4. The diameter of G(m, N) is upper bounded by [11m/8]+ 1. A point to point routing algorithm has been presented. Implementation of ASCEND/DESCEND algorithms in O(m) time has been discussed. It has been shown that in case of a single node failure, the diameter increases by at most 6.  相似文献   

10.
The inflation GI of a graph G with n(G) vertices and m(G) edges is obtained from G by replacing every vertex of degree d of G by a clique Kd. A set S of vertices in a graph G is a paired dominating set of G if every vertex of G is adjacent to some vertex in S and if the subgraph induced by S contains a perfect matching. The paired domination number γp(G) is the minimum cardinality of a paired dominating set of G. In this paper, we show that if a graph G has a minimum degree δ(G)2, then n(Gp(GI)4m(G)/[δ(G)+1], and the equality γp(GI) = n(G) holds if and only if G has a perfect matching. In addition, we present a linear time algorithm to compute a minimum paired-dominating set for an inflation tree.  相似文献   

11.
A c-vertex-ranking of a graph G for a positive integer c is a labeling of the vertices of G with integers such that, for any label i, deletion of all vertices with labels >i leaves connected components, each having at most c vertices with label i. A c-vertex-ranking is optimal if the number of labels used is as small as possible. We present sequential and parallel algorithms to find an optimal c-vertex-ranking of a partial k-tree, that is, a graph of treewidth bounded by a fixed integer k. The sequential algorithm takes polynomial-time for any positive integer c. The parallel algorithm takes O(log n) parallel time using a polynomial number of processors on the common CRCW PRAM, where n is the number of vertices in G.  相似文献   

12.
The backtrack search problem involves visiting all the nodes of an arbitrary binary tree given a pointer to its root subject to the constraint that the children of a node are revealed only after their parent is visited. We present a fast, deterministic backtrack search algorithm for a p-processor COMMON CRCW-PRAM, which visits any n-node tree of height h in time O((n/p+h)(logloglogp)2). This upper bound compares favourably with a natural Ω(n/p+h) lower bound for this problem. Our approach embodies novel, efficient techniques for dynamically assigning tree-nodes to processors to ensure that the work is shared equitably among them.  相似文献   

13.
An efficient parallel algorithm for merging two sorted lists is presented. The algorithm is based on a novel partitioning algorithm that splits the two lists among the processors, in a way that ensures load balance during the merge. The partitioning algorithm can itself be efficiently parallelized, allowing the solution to scale with increased numbers of processors. A shared memory multiprocessor is assumed. The time complexity for partitioning and merging is O(N/p + log N), where p is the number of processors and N is the total number of elements in the two lists. Implementation results on a twenty node Sequent Symmetry multiprocessor are also presented.  相似文献   

14.
In recent years, the conversion of residue numbers to a binary integer has been intensively studied. The Chinese Remainder Theorem (CRT) is a solution to this conversion problem of a number to the Residue Number System with a general moduli set. This paper presents a new division-free conversion approach for the conversion of residue numbers to a binary integer. The algorithm differs from others employing a great number of division instructions by using shift instructions instead. These simple instructions keep the power consumption lower. This algorithm can also be implemented with a lookup table or upon a vector machine. Both make the conversion process efficient. This division-free algorithm employs the concept of Montgomery multiplication algorithm. There are two variations of Montgomery algorithm proposed, which are algorithms MMA and IMA. The algorithm MMA is to transform the input number into the output presentation of Montgomery algorithm. Algorithm IMA is therefore inverse the computation of Montgomery algorithm to obtain the multiplicand. These two algorithms are in the complexity of O(n), where n is log2 qi. qi is a modulus. The proposed algorithm for converting the residues to a binary integer therefore runs on O(n × log m) times on O(m) processors. There are O(log m) iterations of O(n) complexity. Compared with the traditional conversion algorithm, the advantages of this proposed algorithm are not only in employing simpler operations but also in performing fewer iterations.  相似文献   

15.
This paper describes an efficient algorithm for the parallel solution of systems of linear equations with a block tridiagonal coefficient matrix. The algorithm comprises a multilevel LU-factorization based on block cyclic reduction and a corresponding solution algorithm.

The paper includes a general presentation of the parallel multilevel LU-factorization and solution algorithms, but the main emphasis is on implementation principles for a message passing computer with hypercube topology. Problem partitioning, processor allocation and communication requirement are discussed for the general block tridiagonal algorithm.

Band matrices can be cast into block tridiagonal form, and this special but important problem is dealt with in detail. It is demonstrated how the efficiency of the general block tridiagonal multilevel algorithm can be improved by introducing the equivalent of two-way Gaussian elimination for the first and the last partitioning and by carefully balancing the load of the processors. The presentation of the multilevel band solver is accompanied by detailed complexity analyses.

The properties of the parallel band solver were evaluated by implementing the algorithm on an Intel iPSC hypercube parallel computer and solving a larger number of banded linear equations using 2 to 32 processors. The results of the evaluation include speed-up over a sequential processor, and the measure values are in good agreement with the theoretical values resulting from complexity analysis. It is found that the maximum asymptotic speed-up of the multilevel LU-factorization using p processors and load balancing is approximated well by the expression (p +6)/4.

Finally, the multilevel parallel solver is compared with solvers based on row and column interleaved organization.  相似文献   


16.
We consider the problem of scheduling n identical jobs with unequal ready times on m parallel uniform machines to minimize the maximum lateness. This paper develops a branch-and-bound procedure that optimally solves the problem and introduces six simple single-pass heuristic procedures that approximate the optimal solution. The branch-and-bound procedure uses the heuristics to establish an initial upper bound. On sample problems, the branch-and-bound procedure in most instances was able to find an optimal solution within 100,000 iterations with n≤80 and m≤3. For larger values of m, the heuristics provided approximate solutions close to the optimal values.  相似文献   

17.
We first show how to transform the solution of an n × n tridiagonal system into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values. The derived parallel algorithm allows the tridiagonal system to be solved in O(log n) time on an unshuffle network with Θ(n /log n) processors. It is cost-optimal in the sense that processor number times execution time is minimized. Our solver is conceptually simple and easy for implementation.  相似文献   

18.
This paper outlines an algorithm for optimum linear ordering (OLO) of a weighted parallel graph with O(n log k) worst-case time complexity, and O(n + k log(n/k) log k) expected-case time complexity, where n is the total number of nodes and k is the number of chains in the parallel graph. Next, the two-layer OLO problem is considered, where the goal is to place the nodes linearly in two routing layers minimizing the total wire length. The two-layer problem is shown to subsume the maxcut problem and a befitting heuristic algorithm is proposed. Experimental results on randomly generated samples show that the heuristic algorithm runs very fast and outputs optimum solutions in more than 90% instances.  相似文献   

19.
In the literature, there are quite a few sequential and parallel algorithms to solve problems on distance-hereditary graphs. Two well-known classes of graphs, which contain trees and cographs, belong to distance-hereditary graphs. We consider the vertex-coloring problem on distance-hereditary graphs. Let T/sub d/(|V|, |E|) and P/sub d/d(|V|, |E|) denote the time and processor complexities, respectively, required to construct a decomposition tree representation of a distance-hereditary graph G=(V,E) on a PRAM model M/sub d/. Our algorithm runs in O(T/sub d/(|V|, |E|)+log|V|) time using O(P/sub d/(|V|, |E|)+|V|/log|V|) processors on M/sub d/. The best known result for constructing a decomposition tree needs O(log/sup 2/ |V|) time using O(|V|+|E|) processors on a CREW PRAM. If a decomposition tree is provided as input, we solve the problem in O(log |V|) time using O(|V|/log |V|) processors on an EREW PRAM. To the best of our knowledge, there is no parallel algorithm for this problem on distance-hereditary graphs.  相似文献   

20.
In this paper, we study parallel branch and bound on fine grained hypercube multiprocessors. Each processor in a fine grained system has only a very small amount of memory available. Therefore, current parallel branch and bound methods for coarse grained systems ( 1000 nodes) cannot be applied, since all these methods assume that every processor stores the path from the node it is currently processing back to the node where the process was created (the back-up path). Furthermore, the much larger number of processors available in a fine grained system makes it imperative that global information (e.g. the current best solution) is continuously available at every processor; otherwise the amount of unnecessary search would become intolerable. We describe an efficient branch-and-bound algorithm for fine grained hypercube multiprocessors. Our method uses a global scheme where all processors collectively store all back-up paths such that each processor needs to store only a constant amount of information. At each iteration of the algorithm, all current nodes may decide whether they need to create new children, be pruned, or remain unchanged. We describe an algorithm that, based on these decisions, updates the current back-up paths and distributes global information in O(log m) steps, where m is the current number of nodes. This method also includes dynamic allocation of search processes to processors and provides optimal load balancing. Even if very drastic changes in the set of current nodes occur, our load balancing mechanism does not suffer any slow down.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号