期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

DSC: scheduling parallel tasks on an unbounded number of processors 总被引：1，自引：0，他引：1

Tao Yang Gerasoulis A. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(9):951-967

We present a low-complexity heuristic, named the dominant sequence clustering algorithm (DSC), for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is on average, comparable to, or even better than, other higher-complexity algorithms. We assume no task duplication and nonzero communication overhead between processors. Finding the optimum solution for arbitrary directed acyclic task graphs (DAG's) is NP-complete. DSC finds optimal schedules for special classes of DAG's, such as fork, join, coarse-grain trees, and some fine-grain trees. It guarantees a performance within a factor of 2 of the optimum for general coarse-grain DAG's. We compare DSC with three higher-complexity general scheduling algorithms: the ETF by J.J. Hwang, Y.C. Chow, F.D. Anger, and C.Y. Lee (1989); V. Sarkar's (1989) clustering algorithm; and the MD by M.Y. Wu and D. Gajski (1990). We also give a sample of important practical applications where DSC has been found useful 相似文献

2.

Task clustering and scheduling for distributed memory parallelarchitectures

Palis M.A. Jing-Chiou Liou Wei D.S.L. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(1):46-55

This paper addresses the problem of scheduling parallel programs represented as directed acyclic task graphs for execution on distributed memory parallel architectures. Because of the high communication overhead in existing parallel machines, a crucial step in scheduling is task clustering, the process of coalescing fine grain tasks into single coarser ones so that the overall execution time is minimized. The task clustering problem is NP-hard, even when the number of processors is unbounded and task duplication is allowed. A simple greedy algorithm is presented for this problem which, for a task graph with arbitrary granularity, produces a schedule whose makespan is at most twice optimal. Indeed, the quality of the schedule improves as the granularity of the task graph becomes larger. For example, if the granularity is at least 1/2, the makespan of the schedule is at most 5/3 times optimal. For a task graph with n tasks and e inter-task communication constraints, the algorithm runs in O(n(n lg n+e)) time, which is n times faster than the currently best known algorithm for this problem. Similar algorithms are developed that produce: (1) optimal schedules for coarse grain graphs; (2) 2-optimal schedules for trees with no task duplication; and (3) optimal schedules for coarse grain trees with no task duplication 相似文献

3.

An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems 总被引：2，自引：0，他引：2

Bansal S. Kumar P. Singh K. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(6):533-544

Scheduling precedence constrained task graphs, with or without duplication, is one of the most challenging NP-complete problems in parallel and distributed computing systems. Duplication heuristics are more effective, in general, for fine grain task graphs and for networks with high communication latencies. However, most of the available duplication algorithms are designed under the assumption of unbounded availability of fully connected processors, and lie in a high complexity range. Low complexity optimal duplication algorithms work under restricted cost and/or shape parameters for the task graphs. Further, the required number of processors grows in proportion to the task-graph size significantly. An improved duplication strategy is proposed that works for arbitrary task graphs, with a limited number of interconnection-constrained processors. Unlike most other algorithms that replicate all possible parents/ancestors of a given task, the proposed algorithm tends to avoid redundant duplications and duplicates the nodes selectively, only if it helps in improving the performance. This results in lower duplications and also lower time and space complexity. Simulation results are presented for clique and an interconnection-constrained network topology with random and regular benchmark task graph suites, representing a variety of parallel numerical applications. Performance, in terms of normalized schedule length and efficiency, is compared with some of the well-known and recently proposed algorithms. The suggested algorithm turns out to be most efficient, as it generates better or comparable schedules with remarkably less processor consumption. 相似文献

4.

A fast recursive mapping algorithm

Song Chen Mary M. Eshaghian 《Concurrency and Computation》1995,7(5):391-409

The paper presents a generic technique for mapping parallel algorithms onto parallel architectures. The proposed technique is a fast recursive mapping algorithm which is a component of the Cluster-M programming tool. The other components of Cluster-M are the Specification module and the Representation module. In the Specification module, for a given task specified by a high-level machine-independent program, a clustered task graph called Spec graph is generated. In the Representation module, for a given architecture or computing organization, a clustered system graph called Rep graph is generated. Given a task (or system) graph, a Spec (or Rep) graph can be generated using one of the clustering algorithms presented in the paper. The clustering is done only once for a given task graph (system graph) independent of any system graphs (task graphs). It is a machine-independent (application-independent) clustering, and therefore it is not repeated for different mappings. The Cluster-M mapping algorithm presented produces a sub-optimal matching of a given Spec graph containing M task modules, onto a Rep graph of N processors, in O(MN) time. This generic algorithm is suitable for both the allocation problem and the scheduling problem. Its performance is compared to other leading techniques. We show that Cluster-M produces better or similar results in significantly less time and using fewer or an equal number of processors as compared to the other known methods. 相似文献

5.

Combining K-Means and K-Harmonic with Fish School Search Algorithm for data clustering task on graphics processing units

《Applied Soft Computing》2016

Data clustering is related to the split of a set of objects into smaller groups with common features. Several optimization techniques have been proposed to increase the performance of clustering algorithms. Swarm Intelligence (SI) algorithms are concerned with optimization problems and they have been successfully applied to different domains. In this work, a Swarm Clustering Algorithm (SCA) is proposed based on the standard K-Means and on K-Harmonic Means (KHM) clustering algorithms, which are used as fitness functions for a SI algorithm: Fish School Search (FSS). The motivation is to exploit the search capability of SI algorithms and to avoid the major limitation of falling into locally optimal values of the K-Means algorithm. Because of the inherent parallel nature of the SI algorithms, since the fitness function can be evaluated for each individual in an isolated manner, we have developed the parallel implementation on GPU of the SCAs, comparing the performances with their serial implementation. The interest behind proposing SCA is to verify the ability of FSS algorithm to deal with the clustering task and to study the difference of performance of FSS-SCA implemented on CPU and on GPU. Experiments with 13 benchmark datasets have shown similar or slightly better quality of the results compared to standard K-Means algorithm and Particle Swarm Algorithm (PSO) algorithm. There results of using FSS for clustering are promising. 相似文献

6.

On exploiting task duplication in parallel program scheduling 总被引：1，自引：0，他引：1

Ishfaq Ahmad Yu-Kwong Kwok 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(9):872-892

One of the main obstacles in obtaining high performance from message-passing multicomputer systems is the inevitable communication overhead which is incurred when tasks executing on different processors exchange data. Given a task graph, duplication-based scheduling can mitigate this overhead by allocating some of the tasks redundantly on more than one processor. In this paper, we focus on the problem of using duplication in static scheduling of task graphs on parallel and distributed systems. We discuss five previously proposed algorithms and examine their merits and demerits. We describe some of the essential principles for exploiting duplication in a more useful manner and, based on these principles, propose an algorithm which outperforms the previous algorithms. The proposed algorithm generates optimal solutions for a number of task graphs. The algorithm assumes an unbounded number of processors. For scheduling on a bounded number of processors, we propose a second algorithm which controls the degree of duplication according to the number of available processors. The proposed algorithms are analytically and experimentally evaluated and are also compared with the previous algorithms 相似文献

7.

CCTD：一种通信限制下的Fork-Join任务调度算法

梁珊珊吴佳骏张军超《计算机科学》2009,36(6):282-285

现代并行系统的复杂调度问题可以转化为Fork-join图的任务调度问题.然而在实际计算环境中,两个处理节点之间的通信大多以独占方式进行,现有的大多数任务调度算法往往忽略了对通信信道独占性的考虑.提出了一种带通信限制的Fork-join图调度算法CCTD.该算法引入了实际环境中的通信独占性限制,同时保证了Fork-join图的基于复制的优化调度,而且尽可能地减少了对处理器占用.实验结果表明,CCTD算法是一种适应性强的、高效的Fork-join图调度算法. 相似文献

8.

一个调度Fork-Join任务图的新算法 总被引：17，自引：1，他引：16

刘振英方滨兴姜誉张毅赵宏《软件学报》2002,13(4):693-697

任务调度是影响工作站网络效率的关键因素之一.Fork-Join任务图可以代表很多并行结构,但其他已有调度Fork-Join任务图算法忽略了在非全互连工作站网络环境中通信之间不能并行执行的问题,有些效率高的算法又没有考虑节省处理器个数的问题.因此,专门针对该任务图,综合考虑调度长度、非并行通信和节省处理器个数问题,提出了一个基于任务复制的静态调度算法TSA_FJ.通过随机产生任务的执行时间和通信时间,生成了多个Fork-Join任务图,并且采用TSA_FJ算法和其他调度算法对生成的任务图进行调度.结果表明, 相似文献

9.

基于有效并行求解策略的显式有限元分析并行算法

付朝江王天奇林悦荣《计算机应用》2018,38(4):1072-1077

针对大规模结构非线性动力问题的有限元分析非常耗时,基于消息传递接口（MPI）机群环境,提出多种基于并行求解策略的显式有限元并行算法。基于显式消息传递的区域分解技术,采取重叠、非重叠区域分解技术及动态任务分配方法,通过将计算与通信重叠,优化处理器间的通信,对非重叠通信区域分解并行算法、重叠通信区域分解并行算法、群动态任务分配算法、动态任务分配算法及动态负载平衡算法进行研究。为在机群环境下实现非线性动力有限元分析,开发了基于有效并行求解策略的显式有限元并行算法。编写了基于消息传递编程模式的并行有限元程序,在工作站机群上实现了数值算例,分析了算法的性能,并与传统的Newmark算法进行了比较。算例表明：群动态任务分配算法的性能优于动态任务分配算法,低于区域分解算法的性能,动态负载平衡算法最优。对相同规模的问题提出的算法比Newmark算法快,优于Newmark算法。对结构非线性动力问题的有限元分析,所提出的并行算法是可行有效的。相似文献

10.

一种基于分簇复制的DAG任务图调度算法

乔伟光曾国荪《计算机工程》2006,32(17):126-128

并行任务调度是影响机群计算效率的关键因素之一，机群环境DAG(Directed Acyclic Graph)任务图调度是一个NP完全问题，只能寻求启发式算法。已有的研究中，图解重构算法在允许任务复制的条件下，通过对DAG图递归分解与子图重构，初步实现了一个可行的调度方案。该文在此基础上，提出了以调度长度增量为依据的任务复制策略，利用该策略调整受制约节点的同簇前驱，解决了任务簇间的时间制约问题，缩短了调度长度；通过合理地选择任务簇进行合并，增大任务簇的粒度，提高了处理器的利用率。提出的以任务簇扩展-合并为特征、以分簇复制为手段的DAG图调度算法，改进和拓展了图解重构方法。实例分析表明本算法复杂度与TDS (Task Duplication Scheduling)相同，但性能更优。相似文献

11.

嵌入式并行系统中基于任务优化的调度算法

下载免费PDF全文

刘勇尹增山杨根庆《计算机工程》2008,34(2):11-13

在嵌入式并行计算系统中,任务调度是决定系统性能的关键。多任务调度中,启发式调度法是一种设计简单且性能良好的调度方法。目前的调度算法大多是基于任务复制的,没有充分考虑前驱任务与其后继任务间的相关性。该文提出了一种基于相关任务优化(DTO)的调度算法,通过分析已用处理机的负载和空闲时间,尽量减少系统的调度长度和处理机数目。算法分析结果表明,DTO算法在性能上优于其他算法,对嵌入式并行计算系统中的多任务调度是一个较好的选择。相似文献

12.

一个调度Out-Tree任务图的启发式算法

下载免费PDF全文

旷文张建军蒋宏《计算机工程与应用》2013,49(12):47-49

任务调度问题是并行分布式计算中的挑战性问题之一。大多数实际的调度算法是启发式的因而常常具有改进的余地。针对Out-Tree任务图这一基本结构提出一个基于任务复制的启发式调度算法,该算法在确保最短调度长度的同时,注重处理器的负载平衡,以达到节约处理器的目的。比较性实验的结果表明,该算法确保了最短调度长度且使用的处理器最少。因而,该算法提高了系统的利用率,避免消耗过多的资源,实际应用性更好。相似文献

13.

Efficient parallel algorithm for pixel classification in remote sensing imagery

Ujjwal Maulik Anasua Sarkar 《GeoInformatica》2012,16(2):391-407

An important approach for image classification is the clustering of pixels in the spectral domain. Fast detection of different land cover regions or clusters of arbitrarily varying shapes and sizes in satellite images presents a challenging task. In this article, an efficient scalable parallel clustering technique of multi-spectral remote sensing imagery using a recently developed point symmetry-based distance norm is proposed. The proposed distributed computing time efficient point symmetry based K-Means technique is able to correctly identify presence of overlapping clusters of any arbitrary shape and size, whether they are intra-symmetrical or inter-symmetrical in nature. A Kd-tree based approximate nearest neighbor searching technique is used as a speedup strategy for computing the point symmetry based distance. Superiority of this new parallel implementation with the novel two-phase speedup strategy over existing parallel K-Means clustering algorithm, is demonstrated both quantitatively and in computing time, on two SPOT and Indian Remote Sensing satellite images, as even K-Means algorithm fails to detect the symmetry in clusters. Different land cover regions, classified by the algorithms for both images, are also compared with the available ground truth information. The statistical analysis is also performed to establish its significance to classify both satellite images and numeric remote sensing data sets, described in terms of feature vectors. 相似文献

14.

基于任务复制的处理器预分配算法 总被引：12，自引：2，他引：12

周双娥袁由光熊兵周欧中红《计算机学报》2004,27(2):216-223

基于任务复制的调度算法比无任务复制的调度算法具有较好的性能．文章在分析了基于任务复制的几个典型算法(如TDS，OSA等算法)及其假设条件后，提出了以使调度长度最短作为主要目标、减少处理机数目作为次要目标的处理器预分配算法PPA．该算法对任务计算时间与任务间通信时间未做任何限制(即不考虑任务粒度)．通过与相关工作的比较可以看出：PPA算法在调度长度与处理器使用数目上均优于其它算法或与其它算法相当，同时，该算法具有与TDS，OSA相同的时间复杂度．这对嵌入式实时分布系统具有重要的意义。相似文献

15.

异构计算中体系结构感知的并行任务分簇方法

郝水侠曾国荪《计算机科学》2013,40(3):121-125

异构计算是高效能计算发展的必然趋势,针对异构计算运行中并行任务和体系结构难匹配的问题,提出了实现并行任务和体系结构匹配的并行任务分簇方法。首先给出效能的概念及异构计算中体系结构感知的分簇问题,然后从理论上分析了异构匹配与效能的关系,提出了实现异构计算匹配和结构匹配的分簇理论,目的是发挥异构计算中机器的潜能,协同处理并行任务,实现高效能。在此基础上,给出相应的算法。最后通过仿真实验说明,该方法可通过簇图与体系结构的匹配缩短通信开销在执行时间上所占的比例,从而缩短并行执行时间,以提高系统利用率,最终实现异构计算的高效能。相似文献

16.

一种基于任务复制调度算法研究

林剑柠吴慧中《小型微型计算机系统》2006,27(7):1296-1299

分布式计算系统中任务调度是NP完全问题，调度算法可以分为任务复制和无任务复制两类．本文在简述了传统TDS算法的缺陷后，提出了一种改进的TDS任务调度算法-MTDS,该算法基于异构计算系统的特点，采用动态DAG图，尽可能的提前每个任务的执行时间，缩短所有任务完成的执行时间；并且避免出现在某一个执行序列中由于某一任务执行时间过长，而影响整个程序的执行时间．相似文献

17.

Contention-aware scheduling with task duplication 总被引：1，自引：0，他引：1

Oliver Sinnen^{Author Vitae} Andrea ToAuthor VitaeManpreet KaurAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(1):77-86

Finding an efficient schedule for a task graph on several processors is a trade-off between maximising concurrency and minimising interprocessor communication. Task duplication is a technique that has been employed to reduce or avoid interprocessor communication. Certain tasks are duplicated on several processors to produce the data locally and avoid the communication among processors. Most of the algorithms using task duplication have been proposed for the classic scheduling model, which allows concurrent communication and ignores contention for communication resources. It is increasingly recognised that this classic model is unrealistic and does not permit creating accurate and efficient schedules. The recently proposed contention model introduces contention awareness into task scheduling by assigning the edges of the task graph to the links of the communication network. It is intuitive that scheduling under such a model benefits even more from task duplication, yet no such algorithm has been proposed as it is not trivial to duplicate tasks under the contention model. This paper proposes a contention-aware task duplication scheduling algorithm. We investigate the fundamentals for task duplication in the contention model and propose an algorithm that is based on state-of-the-art techniques found in task duplication and contention-aware algorithms. An extensive experimental evaluation demonstrates the significant improvements to the speedup of the produced schedules. 相似文献

18.

基于MapReduce的分布式近邻传播聚类算法 总被引：2，自引：0，他引：2

鲁伟明杜晨阳魏宝刚沈春辉叶振超《计算机研究与发展》2012,49(8):1762-1772

随着信息技术迅速发展,数据规模急剧增长,大规模数据处理非常具有挑战性.许多并行算法已被提出,如基于MapReduce的分布式K平均聚类算法、分布式谱聚类算法等.近邻传播(affinity propagation,AP)聚类能克服K平均聚类算法的局限性,但是处理海量数据性能不高.为有效实现海量数据聚类,提出基于MapReduce的分布式近邻传播聚类算法——DisAP.该算法先将数据点随机划分为规模相近的子集,并行地用AP聚类算法稀疏化各子集,然后融合各子集稀疏化后的数据再次进行AP聚类,由此产生的聚类代表作为所有数据点的聚类中心.在人工合成数据、人脸图像数据、IRIS数据以及大规模数据集上的实验表明:DisAP算法对数据规模有很好的适应性,在保持AP聚类效果的同时可有效缩减聚类时间. 相似文献

19.

Task graph pre-scheduling, using Nash equilibrium in game theory 总被引：1，自引：1，他引：0

Marjan Abdeyazdan Saeed Parsa Amir Masoud Rahmani 《The Journal of supercomputing》2013,64(1):177-203

Prescheduling algorithms are targeted at restructuring of task graphs for optimal scheduling. Task graph scheduling is a NP-complete problem. This article offers a prescheduling algorithm for tasks to be executed on the networks of homogeneous processors. The proposed algorithm merges tasks to minimize their earliest start time while reducing the overall completion time. To this end, considering each task as a player attempting to reduce its earliest time as much as possible, we have applied the idea of Nash equilibrium in game theory to determine the most appropriate merging. Also, considering each level of a task graph as a player, seeking for distinct parallel processors to execute each of its independent tasks in parallel with the others, the idea of Nash equilibrium in game theory can be applied to determine the appropriate number of processors in a way that the overall idle time of the processors is minimized and the throughput is maximized. The communication delay will be explicitly considered in the comparisons. Our experiments with a number of known benchmarks task graphs and also two well-known problems of linear algebra, LU decomposition and Gauss–Jordan elimination, demonstrate the distinguished scheduling results provided by applying our algorithm. In our study, we consider ten scheduling algorithms: min–min, chaining, A ^?, genetic algorithms, simulated annealing, tabu search, HLFET, ISH, DSH with task duplication, and our proposed algorithm (PSGT). 相似文献

20.

并行任务图的优化调度算法

李于锋莫则尧肖永浩熊敏《计算机工程与科学》2019,41(6):955-962

科学与工程计算中的很多复杂应用问题需要使用科学工作流技术,超算领域中的科学工作流常以并行任务图建模,并行任务图的有效调度对应用的高效执行有重要意义。给出了资源限制条件下并行任务图的调度模型;针对Fork-Join类并行任务图给出了若干最优化调度结论;针对一般并行任务图提出了一种新的调度算法,该算法考虑了数据通信开销对资源分配和调度性能的影响,并对已有的CPA算法在特定情况下进行了改进。通过实验与常用的CPR和CPA算法做比较,验证了提出的新算法能够获得很好的调度效果。本文提出的调度算法和得到的最优调度结论对工作流应用系统的高性能调度功能开发具有借鉴意义。相似文献