期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

H. B. Zhou 《Parallel Computing》1993,19(12):1359-1373

This paper presents a group of multiple-way graph (with weighted nodes and edges) partitioning algorithms based on a 2-stage constructive-and-refinement mechanism. The graph partitions can be used to control allocation of program units to distributed processors in a way that minimizes the completion time and for design automation applications. In the constructive stage, 4 clustering algorithms are used to construct raw partitions, the second refinement step first adjusts the cluster number to the processor number and then iteratively improves the partitioning cost by employing a Kernighan-Lin based heuristic. This approach represents several extensions to the state-of-the-art methods. A performance comparison of the proposed algorithms is given, based on experiment results. 相似文献

2.

采用启发式分支定界的软硬件划分 总被引：1，自引：0，他引：1

盛蓝平林涛《计算机辅助设计与图形学学报》2005,17(3):414-417

提出一种以任务图为描述方法的软硬件划分方法．首先分别计算芯片所需面积／时间／通信软硬件倾向度,并结合各节点的比重因子获得启发参数;然后采用启发式的分支定界法对系统进行划分,以获得可行解和最优解．通过对文中算法和RECOD和UNRET的划分算法进行编码,并在同一平台上分别计算节点数为10,15,20,25,30的系统的启动间距、最小启动间距及其所需时间,比较各算法之间的性能．文中算法适用于划分粒度较粗和中小规模的系统．相似文献

3.

基于图匹配的多处理机调度算法

周向东林澜陈国勋施伯乐《小型微型计算机系统》2003,24(4):643-647

本文对具有高通讯延迟的多处理机系统（机群系统）上的任务调度算法进行了研究，与以往算法主要考虑任务图的关键路径不同，本文给出了任务图的调度与其偶图匹配的对应关系，并由此提出了一种新的启发式算法，通过模拟试验显示本算法具有较好的调度效果。相似文献

4.

基于DAG图的自适应代码划分优化算法

下载免费PDF全文

周静曾国荪《计算机工程》2007,33(20):15-17

并行编译的两大工作是程序代码划分和调度。对于调度问题，目前已有大量的解决方案，但是针对代码划分提取并行性的研究工作却非常少。该文提出了通过合并结点来划分DAG图的新的划分算法。实例分析证明，该算法是一种有效的、低复杂度的自适应代码划分解决方案，并且适用于异构计算的任务图划分。相似文献

5.

Tuning the granularity of parallelism for distributed graph processing

Xinyuan Luo Sai Wu Wei Wang Lidan Shou 《Distributed and Parallel Databases》2017,35(2):117-148

Popular distributed graph processing frameworks, such as Pregel and GraphLab, are based on the vertex-centric computation model, where users write their customized Compute function for each vertex to process the data iteratively. Vertices are evenly partitioned among the compute nodes, and the granularity of parallelism of the graph algorithm is normally tuned by adjusting the number of compute nodes. Vertex-centric model splits the computation into phases. Inside one specific phase, the computation proceeds as an embarrassingly parallel process, because no communication between compute nodes incurs. By default, current graph engine only handles one iteration of the algorithm in a phase. However, in this paper, we find that we can also tune the granularity of parallelism, by aggregating the computation of multiple iterations into one phase, which has a significant impact on the performance of the graph algorithm. In the ideal case, if all computations are handled in one phase, the whole algorithm turns into an embarrassingly parallel algorithm and the benefit of parallelism is maximized. Based on this observation, we propose two approaches, a function-based approach and a parameter-based approach, to automatically transform a Pregel algorithm into a new one with tunable granularity of parallelism. We study the cost of such transformation and the trade-off between the granularity of parallelism and the performance. We provide a new direction to tune the performance of parallel algorithms. Finally, the approaches are implemented in our graph processing system, N2, and we illustrate their performance using popular graph algorithms. 相似文献

6.

国产异构系统上的HPCG并行算法及高效实现

刘芳芳王志军汪荃吴丽鑫马文静杨超孙家昶《软件学报》2021,32(8):2341-2351

HPCG基准测试程序是一种新的超级计算机排名度量标准.该测试基准主要用于衡量超级计算机解决大规模稀疏线性系统的能力,更贴近实际应用,近年来广受关注.基于国产超级计算机研究异构众核并行HPCG软件具有非常重要的意义,其不仅可以提升国产超级计算机HPCG的排名,还对很多应用提供了并行算法、优化技术等方面的参考.面向某国产复杂异构超级计算机开展研究,首先采用了分块图着色算法对HPCG进行并行,并提出一种适用于结构化网格的图着色算法.该算法并行性能高于传统的JPL、CC等算法,且着色质量高,运用于HPCG后,迭代次数减少了3次,整体性能提升了6%.分析了复杂异构系统各个部件传输的开销,提出一套更适用于HPCG的任务划分方法,并从稀疏矩阵存储格式、稀疏矩阵重排、访存等角度开展了细粒度的优化.在多进程计算时,还采用内外区划分算法将核心函数SpMV、SymGS中的邻居通信操作进行了隐藏.最终整机测试时,性能达到了国产超级计算机峰值性能的1.67%,与单节点相比,整机弱可扩展性并行效率达到了92%. 相似文献

7.

一类国产复杂异构系统上的HPCG并行算法及高效实现研究

刘芳芳王志军汪荃吴丽鑫马文静杨超孙家昶《软件学报》2020,31(7)

HPCG基准测试程序是一种新的超级计算机排名度量标准.该测试基准主要用于衡量超级计算机解决大规模稀疏线性系统的能力,更贴近实际应用,近年来广受关注.基于国产超级计算机研究异构众核并行HPCG软件具有非常重要的意义,其不仅可以提升国产超级计算机HPCG的排名,还对很多应用提供了并行算法、优化技术等方面的参考.本文面向某国产复杂异构超级计算机开展研究,首先采用了分块图着色算法对HPCG进行并行,并提出一种适用于结构化网格的图着色算法,该算法并行性能高于传统的JPL、CC等算法,且着色质量高,运用于HPCG后,迭代次数减少了3次,整体性能提升了6%.本文还分析了复杂异构系统各个部件传输的开销,提出一套更适用于HPCG的任务划分方法,并从稀疏矩阵存储格式、稀疏矩阵重排、访存等角度开展了细粒度的优化.另外在多进程计算时,还采用了内外区划分算法将核心函数SpMV、SymGS中的邻居通信操作进行了隐藏.最终整机测试时,性能达到国产超级计算机峰值性能的1.67%,相比单节点,整机弱可扩展性并行效率达到了92%. 相似文献

8.

概率构造算法与遗传算法融合的可重构计算系统硬件任务划分

陈伟男周博彭澄廉《计算机辅助设计与图形学学报》2007,19(8):960-965

提出一种概率构造算法与遗传算法融合的算法,通过引入表示划分结果多样性的度量方法,利用概率构造算法产生具有多样性的较优的初始群体,并在此基础上利用遗传算法寻求最优解.实验结果表明,该算法能够获得比已有的基于列表的划分算法更优的划分结果,比采用完全随机初始群体的遗传算法缩短了运行时间. 相似文献

9.

硬件加速功能验证问题的DAG划分算法

何天祥肖正陈岑刘楚波李肯立《软件学报》2022,33(9):3236-3248

功能验证是超大规模集成电路(very large scale integration, VLSI)设计的一个基本环节. 随着超大规模电路的普及与发展, 在单处理器上对整个电路进行功能验证在可行性和效率上都存在较大的缺陷. 基于硬件加速器的功能验证是将整个电路划分成若干个规模更小的子电路; 然后在多个硬件处理器上并行的执行功能验证. 当电路划分结果的并行性较优时可提高功能验证的效率, 缩短时间周期. 类似电路设计中的其他划分问题, 用于硬件加速功能验证的电路划分问题可以被抽象成图划分问题. 相较于传统图划分问题, 硬件加速功能验证的划分问题还需要保证较小的模拟深度和较高的调度并行性. 为了满足硬件加速功能验证的划分需求, 提出了一种基于传统多级图划分策略的有效算法. 该算法结合调度思想, 利用电路的关键路径信息和时序信息, 将硬件加速功能验证问题转化为有向无环图的多级划分问题. 随机电路网表数据的实验结果表明, 所构造的算法可以有效的减少关键路径长度并且不会引起切边数的增长恶化. 相似文献

10.

基于DAG图解-重构的机群系统静态调度算法 总被引：5，自引：0，他引：5

周佳祥郑纬民《软件学报》2000,11(8):1097-1104

机群系统静态任务调度是NP-完全问题,通常的算法是通过一些启发式算法得到多项式次优解.该文提出的图解-子图重构算法实现了对分布在有向无环图(directed acyclic graph, 简称DAG)上的并行任务的快速有效调度.该算法的复杂性为O(log^|V|×(|V|+| E|)),采用递归方法实现了对任务图的有效分解和子图重构,生成任务群,完成任务调度,并且初步实现了对处理机的优化.通过实例分析以及与其他启发式调度算法的性能比较,证明该算法是一种快速、有效、可相似文献

11.

基于广度优先遍历加权图生成的启发式图分区

下载免费PDF全文

蹇冬宇程永利《计算机系统应用》2023,32(12):218-223

图分区质量极大程度上影响着计算机之间的通信开销和负载平衡, 这对于大规模并行图计算的性能是至关重要的. 然而, 随着图数据规模的越来越大, 图分区算法的执行时间成了一个不可避免的问题. 因此, 研究如何优化图分区算法的执行效率是有必要的. 本文提出了一个基于广度优先遍历加权图生成的启发式图分割方法, 该方法在实现较低的通信代价和较好负载平衡的同时, 只引入了少量的预处理时间开销. 实验结果表明, 本文的划分方法减少了复制因子, 降低通信开销, 并且引入的时间开销较小. 相似文献

12.

Finding optimal solutions to the graph partitioning problem with heuristic search 总被引：1，自引：0，他引：1

Ariel Felner 《Annals of Mathematics and Artificial Intelligence》2005,45(3-4):293-322

As search spaces become larger and as problems scale up, an efficient way to speed up the search is to use a more accurate heuristic function. A better heuristic function might be obtained by the following general idea. Many problems can be divided into a set of subproblems and subgoals that should be achieved. Interactions and conflicts between unsolved subgoals of the problem might provide useful knowledge which could be used to construct an informed heuristic function. In this paper we demonstrate this idea on the graph partitioning problem (GPP). We first show how to format GPP as a search problem and then introduce a sequence of admissible heuristic functions estimating the size of the optimal partition by looking into different interactions between vertices of the graph. We then optimally solve GPP with these heuristics. Experimental results show that our advanced heuristics achieve a speedup of up to a number of orders of magnitude. Finally, we experimentally compare our approach to other states of the art graph partitioning optimal solvers on a number of classes of graphs. The results obtained show that our algorithm outperforms them in many cases. 相似文献

13.

图的有损摘要问题的两阶段算法

冯康陈卫东《计算机系统应用》2023,32(6):189-196

问题如下:给定图G=(V, E)和正整数k,要求将图G中所有节点合并成为k个超节点,满足由这些超节点组成的摘要图能够在一定误差范围内表示原图G.这是一个基于图划分的组合优化问题,一个主要求解思路是逐次地随机抽取节点对集并用启发式方法从中选取节点对进行合并.本文提出一个有效的两阶段求解算法TS＿LGS.算法根据图G的平均点度特征设置阶段阈值:当前超节点数大于阶段阈值为第1阶段,期间算法在采样节点对中基于当前最佳合并分数批量选择节点对合并,旨在有效减少迭代次数;否则为第2阶段,期间算法在加权采样的基础上优先挑选相邻的节点对,旨在找到重构误差增量较小的节点对合并,直至超节点的个数为k.在典型的真实网络实例图上与现有最好算法SAA进行了实验对比,结果表明,算法TS＿LGS以较低时间复杂度提取到的图摘要具有更低的重构误差和查询误差. 相似文献

14.

基于画图算法的WSN节点定位算法 总被引：1，自引：1，他引：0

下载免费PDF全文

张清国王敬华《计算机工程》2009,35(20):25-27

针对无线传感器网络的节点定位问题,提出一种新的基于类Kamada Kawai画图算法的无线传感器网络节点定位算法,将无线传感器网络节点定位问题转化成画图问题,用经典的画图算法求得问题的最优解,从而实现对节点的定位。仿真实验结果表明,该算法收敛速度快、定位精度高、能够获得较好的效果。相似文献

15.

Finding Optimal Ordering of Sparse Matrices for Column-Oriented Parallel Cholesky Factorization

Lin Wen-Yang 《The Journal of supercomputing》2003,24(3):259-277

In this paper, we consider the problem of finding fill-preserving sparse matrix orderings for parallel factorization. That is, given a large sparse symmetric and positive definite matrix A that has been ordered by some fill-reducing ordering, we want to determine a reordering that is appropriate in terms of preserving the sparsity and minimizing the cost to perform the Cholesky factorization in parallel. Past researches on this problem all are based on the elimination tree model, in which each node represents the task for factoring a column, and thus, can be seen as a coarse-grained task dependence model. To exploit more parallelism, Joseph Liu proposed a medium-grained task model, called the column task graph, and showed that it is amenable to the shared-memory supercomputers. Based on the column task graph, we devise a greedy reordering algorithm, and show that our algorithm can find the optimal ordering among the class of all fill-preserving orderings of the given sparse matrix A. 相似文献

16.

基于任务-资源分配图优化选取的网格依赖任务调度 总被引：3，自引：0，他引：3

陈廷伟张斌郝宪文《计算机研究与发展》2007,44(10):1741-1750

任务调度是网格应用系统获得高性能的关键.网格计算中一个大型的应用程序往往被分解为具有依赖关系的多个任务.在资源个体差异较大、广域互连的网格环境下任务间的依赖关系对传统的调度策略提出了新的挑战.任务调度的主要工作是为任务分配资源以及确定任务的执行次序,将依赖任务的可能的资源分配方案表示为任务-资源分配图（T-RAG）,在该图的基础上提出了基于T-RAG优化选取的依赖任务调度模型,将依赖任务调度问题转化为图的优化选取问题,解析最优任务-资源分配图可以同时确定资源分配方案和任务的执行次序即为最优调度方案.最后,实现了基于该模型的任务调度算法,该算法与ILHA算法的对比分析表明,在资源差异较大及任务间存在大量数据传输的情况下所提出的算法更优. 相似文献

17.

Granularity Analysis for Exploiting Adaptive Parallelism of Declarative Programs on Multiprocessors

下载免费PDF全文

田新民王鼎兴沈美明郑纬民温冬婵《计算机科学技术学报》1994,(2)

1IntroductionAutomaticparallelexecutionofdeclarativelanguageprograms(e.g.functionprogramsandlogicprograms)isattractive,asitmakestheuseofparallelcomputersveryeasy,andtheprogrammerneednotbeconcernedwiththespecificsoftheunderlyingparallelarchitecture.However,ifseveralprocessorsareexecutingconcurrently,exploitingadaptiveparallelismishardduetonon-determinismoftaskgranularityanddatadependenciesamongtasks.TheearlysolutionproposedbyConeryandKibler[2]usesanorderingalgorithmtodeterminedependenciesatrun… 相似文献

18.

Compact DAG Representation and Its Dynamic Scheduling

Michel Cosnard Emmanuel Jeannot 《Journal of Parallel and Distributed Computing》1999,58(3):143

Scheduling large task graphs is an important issue in parallel computing. In this paper we tackle the two following problems: (1) how to schedule a task graph, when it is too large to fit into memory? (2) How to build a generic program such that parameter values of a task graph can be given at run-time? Our answers feature the parameterized task graph (PTG), which is a symbolic representation of the task graph. We propose a dynamic scheduling algorithm which takes a PTG as an entry and allows us to generate a generic program. We present a theoretical study which shows that our algorithm finds good schedules for coarse-grain task graphs, has a low memory cost, and a low computational complexity. When the average number of operations of each task is large enough, we prove that the scheduling overhead is negligible with respect to the makespan. We also provide experimental results that demonstrate the feasibility of our approach using several compute-intensive kernels found in numerical scientific applications. 相似文献

19.

基于多层k路划分的三维网格并行任务分配策略

于方郑晓薇孙晓鹏《计算机工程与设计》2010,31(2)

为解决传统任务划分方法在三维网格并行计算任务分配阶段产生的通信开销大的问题,提出了一种基于多层k路划分算法的并行任务分配策略.首先利用多层k路划分算法划分三维网格,将任务划分问题转化为图划分问题,然后基于图划分结果给出一个任务映射并行算法将计算任务分配到各计算结点.在深腾1800上求解三维网格模型最短路径问题的实验结果表明,相比于传统的行列划分任务分配策略,该策略在保证负裁平衡的同时有效地降低了通信开销,算法的运行时间减少,加速比得到提高. 相似文献

20.

基于图染色理论和遗传蜂群算法的并行测试任务调度

吴勇王雪赵焕义《计算机应用》2015,35(5):1280-1283

针对并行测试中任务优化调度这一关键性问题,提出了一种图染色理论和遗传蜂群算法相结合的任务调度优化算法.首先,建立了基于图染色理论的并行测试任务关系模型,用图来描述测试任务占用仪器资源的情况;然后, 在测试任务关系模型的基础上,将遗传算法特有的交叉、变异操作与人工蜂群(ABC)算法相结合搜索最优解,能够有效避免算法早熟并且加速算法收敛;最终得到并行度最大的任务分组方案.经仿真验证,所提方法能有效地实现并行测试,提高自动测试系统的测试效率. 相似文献