期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A parallel algorithm for computing Fourier transforms on the stargraph

Fragopoulou P. Akl S.G. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(5):525-531

The n-star graph, denoted by S_n, is one of the graph networks that have been recently proposed as attractive alternatives to the n-cube topology for interconnecting processors in parallel computers. We present a parallel algorithm for the computation of the Fourier transform on the star graph. The algorithm requires O(n²) multiply-add steps for an input sequence of n! elements, and is hence cost-optimal with respect to the sequential algorithm on which it is based. This is believed to be the first algorithm, and the only one to date, for the computation of the Fourier transform on the star graph 相似文献

2.

New fast parallel algorithm for the connected component problem and its VLSI implementation

SUJIT DEY PRADIP K. SRIMANI 《International journal of systems science》2013,44(11):2177-2185

We present an O(log n) compare-exchange time parallel algorithm to compute the connected components of a given graph. We also introduce a simple regular VLSI architecture on which the proposed algorithm can readily be implemented requiring n³ identical processing elements and O(n) communication time, where n is the number of vertices in the graph. 相似文献

3.

An optimal broadcasting algorithm without message redundancy instar graphs

Jang-Ping Sheu Chao-Tsung Wu Tzung-Shi Chen 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(6):653-658

Based on the V.E. Mendia and D. Sarkar's algorithm (1992), we propose an optimal and nonredundant distributed broadcasting algorithm in star graphs. For an n-dimensional star graph, our algorithm takes O(n log₂ n) time and guarantees that all nodes in the star graph receive the message exactly once. Moreover, broadcasting m packets in a pipeline fashion takes O(m log₂ n+n log₂ n) time due to the nonredundant property of our broadcasting algorithm 相似文献

4.

大规模3D并行分层可扩展矩阵乘法的递阶优化方法

卢炼阳爱民《计算机应用研究》2017,34(6)

为进一步提高大规模平台上可扩展矩阵乘法的并行计算效率,提出一种并行分层可扩展矩阵乘法的递阶优化方法。首先,在可扩展矩阵乘法算法(SMM)算法枢轴行和枢轴列通信研究基础上,利用分层方式在更高等级上对网格进行矩形群划分,实现矩阵乘法的二维计算向三维计算转变,并设计对应的集群内通信和集群间通信过程,实现SMM乘法的递阶并行优化(HSMM);其次,对所提HSMM算法进行理论分析,分情况对其通信成本进行分析和预测,推导出最佳计算成本的集群数选取方式;最后,通过在Grid5000和BlueGene/P测试平台实验,验证了所提算法有效性和理论分析的正确性。相似文献

5.

Allocating tree structured programs in a distributed system withuniform communication costs

Billionnet A. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(4):445-448

Studies the complexity of the problem of allocating m modules to n processors in a distributed system to minimize total communication and execution costs. When the communication graph is a tree, Bokhari has shown that the optimum allocation can be determined in O(mn²) time. Recently, this result has been generalized by Fernandez-Baca, who has proposed an allocation algorithm in O(mn^k+1) when the communication graph is a partial k-tree. The author shows that in the case where communication costs are uniform, the module allocation problem can be solved in O(mn) time if the communication graph is a tree. This algorithm is asymptotically optimum 相似文献

6.

基于星形互连网络的并行快速傅立叶变换算法 总被引：6，自引：0，他引：6

史云涛侯紫峰宋建平《计算机研究与发展》2002,39(5):625-630

星形互连网络是一种易于实现大规模并行计算的互连网络拓扑结构。利用星形互连网络的递归可分解性的多样性，提出了一种基于星形互连网络的并行快速傅立叶变换算法的实现方法。该方法能够有效地减少计算过程中处理器结点之间的通信开销。提出的星图结点和数据的映射应运及实现并行FFT的思想可推广到线性方程组求解、矩阵乘法等其它并行算法在星形互连网络上的实现。相似文献

7.

Task clustering and scheduling for distributed memory parallelarchitectures

Palis M.A. Jing-Chiou Liou Wei D.S.L. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(1):46-55

This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clustering problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in O(n(n lg n+e)) time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication 相似文献

8.

An Optimal Parallel Co-Connectivity Algorithm

Ka Wong Chong Stavros D. Nikolopoulos Leonidas Palios 《Theory of Computing Systems》2004,37(4):527-546

In this paper we consider the problem of computing the connected components of the complement of a given graph. We describe a simple sequential algorithm for this problem, which works on the input graph and not on its complement, and which for a graph on n vertices and m edges runs in optimal O(n+m) time. Moreover, unlike previous linear co-connectivity algorithms, this algorithm admits efficient parallelization, leading to an optimal O(log n)-time and O((n+m)log n)-processor algorithm on the EREW PRAM model of computation. It is worth noting that, for the related problem of computing the connected components of a graph, no optimal deterministic parallel algorithm is currently available. The co-connectivity algorithms find applications in a number of problems. In fact, we also include a parallel recognition algorithm for weakly triangulated graphs, which takes advantage of the parallel co-connectivity algorithm and achieves an O(log² n) time complexity using O((n+m²) log n) processors on the EREW PRAM model of computation. 相似文献

9.

Optimal broadcasting on the star graph 总被引：2，自引：0，他引：2

Mendia V.E. Sarkar D. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):389-396

The star graph has been show to be an attractive alternative to the widely used n-cube. Like the n-cube, the star graph possesses rich structure and symmetry as well as fault tolerant capabilities, but has a smaller diameter and degree. However, very few algorithms exists to show its potential as a multiprocessor interconnection network. Many fast and efficient parallel algorithms require broadcasting as a basic step. An optimal algorithm for one-to-all broadcasting in the star graph is proposed. The algorithm can broadcast a message to N processors in O(log₂ N) time. The algorithm exploits the rich structure of the star graph and works by recursively partitioning the original star graph into smaller star graphs. In addition, an optimal all-to-all broadcasting algorithm is developed 相似文献

10.

A comparative study of topological properties of hypercubes andstar graphs

Day K. Tripathi A. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(1):31-38

Undertakes a comparative study of two important interconnection network topologies: the star graph and the hypercube, from the graph theory point of view. Topological properties are derived for the star graph and are compared with the corresponding properties of the hypercube. Among other results, the authors determine necessary and sufficient conditions for shortest path routing and characterize maximum-sized families of parallel paths between any two nodes of the star graph. These parallel paths are proven of minimum length within a small additive constant. They also define greedy and asymptotically balanced spanning trees to support broadcasting and personalized communication on the star graph. These results confirm the already claimed topological superiority of the star graph over the hypercube 相似文献

11.

Fully dynamic maintenance of k-connectivity in parallel

Weifa Liang Brent R.P. Hong Shen 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(8):846-864

Given a graph G=(V, E) with n vertices and m edges, the k-connectivity of G denotes either the k-edge connectivity or the k-vertex connectivity of G. In this paper, we deal with the fully dynamic maintenance of k-connectivity of G in the parallel setting for k=2, 3. We study the problem of maintaining k-edge/vertex connected components of a graph undergoing repeatedly dynamic updates, such as edge insertions and deletions, and answering the query of whether two vertices are included in the same k-edge/vertex connected component. Our major results are the following: (1) An NC algorithm for the 2-edge connectivity problem is proposed, which runs in O(log n log(m/n)) time using O(n^3/4) processors per update and query. (2) It is shown that the biconnectivity problem can be solved in O(log^{2 n}) time using O(nα(2n, n)/logn) processors per update and O(1) time with a single processor per query or in O(log n log_n/^m) time using O(nα(2n, n)/log n) processors per update and O(logn) time using O(nα(2n, n)/logn) processors per query, where α(.,.) is the inverse of Ackermann's function. (3) An NC algorithm for the triconnectivity problem is also derived, which takes O(log n log_n/^m+logn log log n/α(3n, n)) time using O(nα(3n, n)/log n) processors per update and O(1) time with a single processor per query. (4) An NC algorithm for the 3-edge connectivity problem is obtained, which has the same time and processor complexities as the algorithm for the triconnectivity problem. To the best of our knowledge, the proposed algorithms are the first NC algorithms for the problems using O(n) processors in contrast to Ω(m) processors for solving them from scratch. In particular, the proposed NC algorithm for the 2-edge connectivity problem uses only O(n^3/4) processors. All the proposed algorithms run on a CRCW PRAM 相似文献

12.

Algorithms for a Class of Isotonic Regression Problems 总被引：4，自引：0，他引：4

P. M. Pardalos G. Xue 《Algorithmica》1999,23(3):211-222

The isotonic regression problem has applications in statistics, operations research, and image processing. In this paper a general framework for the isotonic regression algorithm is proposed. Under this framework, we discuss the isotonic regression problem in the case where the directed graph specifying the order restriction is a directed tree with n vertices. A new algorithm is presented for this case, which can be regarded as a generalization of the PAV algorithm of Ayer et al. Using a simple tree structure such as the binomial heap, the algorithm can be implemented in O(n log n) time, improving the previously best known O(n ² ) time algorithm. We also present linear time algorithms for special cases where the directed graph is a path or a star. Received September 2, 1997; revised January 2, 1998, and February 16, 1998. 相似文献

13.

一个求图的连通分支的并行算法

唐策善梁维发《软件学报》1993,4(4):61-66

已知一个无向图G(V,E),|V|=n,|E|=m,本文基于SIMD共享存贮模型,运用数据在图中快速传播原理,建议了一个新的求图的连通分支算法,具体来讲,在SIMD—CREW共享存贮模型上,求图的连通分支需O(log²n)时间、O(n²/logn)处理器;而在SIMD—CRCW共享存贮模型上需O(logn)时间、O(n²)处理器,建议的算法同著名的Hirschberg算法相比,其主要差别表现在:1)采用的求解方法不同;2)建议的算法简单易懂相似文献

14.

基于CREW遍历图的一种并行算法

廖常武《计算机与现代化》2006,(9):12-14

针对串行算法模型下基于顶点遍历图的情况，提出了一种在CREWPRAM并行模型下遍历无向图的算法。该算法是找出无向图的一棵最短路径生成树，由向上和向下两条有向边替换最短路径生成树的每条边形成欧拉回路，运用欧拉回路技术计算前缀和，前缀和所对应的顶点即为遍历无向图的顺序。得出了该算法时间复杂度为O（n＋logn）的结论。相似文献

15.

背包问题无存储冲突的并行三表算法 总被引：4，自引：0，他引：4

李肯立李仁发李庆华《计算机学报》2006,29(2):345-352

背包问题属于经典的NP难问题，在信息密码学和数论等研究中具有极重要的应用，将求解背包问题著名的二表算法的设计思想应用于三表搜索中，利用分治策略和无存储冲突的最优归并算法，提出一种基于EREW-SIMD共享存储模型的并行三表算法，算法使用O（2^n/4）个处理机单元和O（2^3n/8）的共享存储空间，在O（2^3n/8）时间内求解n维背包问题．将提出的算法与已有文献结论进行的对比分析表明：文中算法明显改进了现有文献的研究结果，是一种可在小于O（2^n/2）的硬件资源上，以小于O（2n/2）的计算时问求解背包问题的无存储冲突并行算法。相似文献

16.

Study of General Incomplete Star Interconnection Networks

下载免费PDF全文

史云涛侯紫峰宋建平《计算机科学技术学报》2002,17(3):0-0

IIntroduct1OllThe star graph【1]proposed as a particular case ofC叫ley graphs[ZJ Is vertex－and edge－symmetric．strongly hlerarchlcal,maximally tault－tolerant,strongly resilient and has diameter and node degreethat are superior to those of a slmll。－slied hypercube(which Is also a Cnyley graph)IOr parallelcomputers[3]．MNor references can be found In studying the star graph regarding its propertiesllJ,e毗edding cwabllity[4],communication c叩劝ility[5－71,andfanfaut－toierm尬 cap劝ility… 相似文献

17.

星形图上无死锁的路径算法 总被引：4，自引：0，他引：4

石凤仙熊鹏荣周玉林朱洪《计算机学报》1998,21(10):946-951

星形图具有许多良好的拓扑性质，是一种有可能替代传统的超立方体的并行计算互联网络的模型。在本文中，作者针对在星形图这样一种高度规则的网络中，可能产生死锁的问题，对星形图上无死锁的路径算法进行了研究。首先利用星形图中匹配基的性质，给出了从Ｓｎ（Ｂ）到Ｓｋ的正规映射的定义，然后提出了星形图上的两个无死锁受限条件，最后证明了一个满足无死锁受限条件的路径算法。作者还提出了星形图上路径算法的最小无死锁受限条件相似文献

18.

A distributed graph algorithm for the detection of local cycles andknots

Boukerche A. Tropper C. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(8):748-757

In this paper, a distributed cycle/knot detection algorithm for general graphs is presented. The algorithm distinguishes between cycles and knots and is the first algorithm to our knowledge which does so. It is especially relevant to an application such as parallel simulation in which 1) cycles and knots can arise frequently 2) the size of the graph is very large, and 3) it is necessary to know if a given node is in a cycle or a knot. It requires less communication than previous algorithms-2m vs. (at least) (4m) for the Chandy and Misra algorithm, where m is the number of links in the graph. It requires O (nlog (n)) bits of memory, where n is the number of nodes. The algorithm differs from the classical diffusing computation methods through its use of incomplete search messages to speed up the computation. We introduce a marking scheme in order to identify strongly connected subcomponents of the graph which cannot reach the initiator of the algorithm. This allows us to distinguish between the case in which the initiator is in a cycle (only) or is in a knot 相似文献

19.

Space and time optimal parallel sequence alignments

Rajko S. Aluru S. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(12):1070-1081

We present the first space and time optimal parallel algorithm for the pairwise sequence alignment problem, a fundamental problem in computational biology. This problem can be solved sequentially in O(mn) time and O(m+n) space, where m and n are the lengths of the sequences to be aligned. The fastest known parallel space-optimal algorithm for pairwise sequence alignment takes optimal O(m+n/p) space, but suboptimal O((m+n)/sup 2//p) time, where p is the number of processors. On the other hand, the most space economical time-optimal parallel algorithm takes O(mn/p) time, but O(m+n/p) space. We close this gap by presenting an algorithm that achieves both time and space optimality, i.e. requires only O((m+n)/p) space and O(mn/p) time. We also present an experimental evaluation of the proposed algorithm on an IBM xSeries cluster. Although presented in the context of full sequence alignments, our algorithm is applicable to other alignment problems in computational biology including local alignments and syntenic alignments. It is also a useful addition to the range of techniques available for parallel dynamic programming. 相似文献

20.

模糊聚类计算的最佳算法 总被引：14，自引：0，他引：14

马军邵陆《软件学报》2001,12(4):578-581

给出模糊关系传递闭包在对应模糊图上的几何意义,并提出一个基于图连通分支计算的模糊聚类最佳算法.对任给的n个样本,新算法最坏情况下的时间复杂性函数T(n)满足O(n)≤T(n)≤O(n²).与经典的基于模糊传递闭包计算的模糊聚类算法的O(n³logn)计算时间相比,新算法至少降低了O(n相似文献