首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application tasks with dependences. These applications exhibit both task and data parallelism, and combining these two (also called mixed parallelism) has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task and data parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the task graph, the runtime estimates and scalability characteristics of the tasks, and the intertask data communication volumes. A locality-conscious scheduling strategy is used to improve intertask data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications and synthetic graphs shows that our algorithm consistently generates schedules with a lower makespan as compared to Critical Path Reduction (CPR) and Critical Path and Allocation (CPA), two previously proposed scheduling algorithms. Our algorithm also produces schedules that have a lower makespan than pure task- and data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.  相似文献   

2.
On the granularity and clustering of directed acyclic task graphs   总被引:1,自引:0,他引:1  
The authors consider the impact of the granularity on scheduling task graphs. Scheduling consists of two parts, the processors assignment of tasks, also called clustering, and the ordering of tasks for execution in each processor. The authors introduce two types of clusterings: nonlinear and linear clusterings. A clustering is nonlinear if two parallel tasks are mapped in the same cluster otherwise it is linear. Linear clustering fully exploits the natural parallelism of a given directed acyclic task graph (DAG) while nonlinear clustering sequentializes independent tasks to reduce parallelism. The authors also introduce a new quantification of the granularity of a DAG and define a coarse grain DAG as the one whose granularity is greater than one. It is proved that every nonlinear clustering of a coarse grain DAG can be transformed into a linear clustering that has less or equal parallel time than the nonlinear one. This result is used to prove the optimality of some important linear clusterings used in parallel numerical computing  相似文献   

3.
Programming with parallel tasks leads to task graphs with dependencies representing a parallel program. Scheduling algorithms are employed to find an efficient execution order of the parallel tasks. A large variety of scheduling algorithms exist, including layer‐based scheduling algorithms for homogeneous target platforms that build consecutive layers of independent parallel tasks and schedule each layer separately. Although these scheduling algorithms provide good results in terms of scheduling algorithm runtime and schedule execution time, the resulting schedules leave room for optimization. This article proposes an optimization for arbitrary layer‐based scheduling algorithms, which is called Move‐blocks algorithm. Given a layer‐based schedule of the parallel tasks, this algorithm moves blocks of parallel tasks into preceding layers in order to reduce the overall execution time of a task‐based application. Suitable blocks of parallel tasks are identified by the algorithm Find‐blocks, which is employed together with the Move‐blocks algorithm. The algorithm Move‐blocks is applied to four well‐known scheduling algorithms. A detailed evaluation for a wide range of test cases is given. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
现代并行系统的复杂调度问题可以转化为Fork-join图的任务调度问题.然而在实际计算环境中,两个处理节点之间的通信大多以独占方式进行,现有的大多数任务调度算法往往忽略了对通信信道独占性的考虑.提出了一种带通信限制的Fork-join图调度算法CCTD.该算法引入了实际环境中的通信独占性限制,同时保证了Fork-join图的基于复制的优化调度,而且尽可能地减少了对处理器占用.实验结果表明,CCTD算法是一种适应性强的、高效的Fork-join图调度算法.  相似文献   

5.
分布式计算系统中任务调度是NP完全问题,调度算法可以分为任务复制和无任务复制两类.本文在简述了传统TDS算法的缺陷后,提出了一种改进的TDS任务调度算法-MTDS,该算法基于异构计算系统的特点,采用动态DAG图,尽可能的提前每个任务的执行时间,缩短所有任务完成的执行时间;并且避免出现在某一个执行序列中由于某一任务执行时间过长,而影响整个程序的执行时间.  相似文献   

6.
On parallelizing the multiprocessor scheduling problem   总被引:1,自引:0,他引:1  
Existing heuristics for scheduling a node and edge weighted directed task graph to multiple processors can produce satisfactory solutions but incur high time complexities, which tend to exacerbate in more realistic environments with relaxed assumptions. Consequently, these heuristics do not scale well and cannot handle problems of moderate sizes. A natural approach to reducing complexity, while aiming for a similar or potentially better solution, is to parallelize the scheduling algorithm. This can be done by partitioning the task graphs and concurrently generating partial schedules for the partitioned parts, which are then concatenated to obtain the final schedule. The problem, however, is nontrivial as there exists dependencies among the nodes of a task graph which must be preserved for generating a valid schedule. Moreover, the time clock for scheduling is global for all the processors (that are executing the parallel scheduling algorithm), making the inherent parallelism invisible. In this paper, we introduce a parallel algorithm that is guided by a systematic partitioning of the task graph to perform scheduling using multiple processors. The algorithm schedules both the tasks and messages, and is suitable for graphs with arbitrary computation and communication costs, and is applicable to systems with arbitrary network topologies using homogeneous or heterogeneous processors. We have implemented the algorithm on the Intel Paragon and compared it with three closely related algorithms. The experimental results indicate that our algorithm yields higher quality solutions while using an order of magnitude smaller scheduling times. The algorithm also exhibits an interesting trade-off between the solution quality and speedup while scaling well with the problem size  相似文献   

7.
DSC: scheduling parallel tasks on an unbounded number of processors   总被引:1,自引:0,他引:1  
We present a low-complexity heuristic, named the dominant sequence clustering algorithm (DSC), for scheduling parallel tasks on an unbounded number of completely connected processors. The performance of DSC is on average, comparable to, or even better than, other higher-complexity algorithms. We assume no task duplication and nonzero communication overhead between processors. Finding the optimum solution for arbitrary directed acyclic task graphs (DAG's) is NP-complete. DSC finds optimal schedules for special classes of DAG's, such as fork, join, coarse-grain trees, and some fine-grain trees. It guarantees a performance within a factor of 2 of the optimum for general coarse-grain DAG's. We compare DSC with three higher-complexity general scheduling algorithms: the ETF by J.J. Hwang, Y.C. Chow, F.D. Anger, and C.Y. Lee (1989); V. Sarkar's (1989) clustering algorithm; and the MD by M.Y. Wu and D. Gajski (1990). We also give a sample of important practical applications where DSC has been found useful  相似文献   

8.

Cloud computing is a popular and widely adopted computing platform for the execution of scientific workflows as it provides flexible infrastructure and offers access to collection of autonomous heterogeneous resources. Effective scheduling of computationally complex workflows which contain many interconnected tasks is a complex problem and becomes more challenging in cloud environment. Optimal solutions can be obtained by considering not only the heterogeneity of computation costs involved, but also by taking into account the communication costs among the tasks in a way that schedule length of the application is reduced. In this paper, we propose a list scheduling heuristic, namely minimal optimistic processing time (MOPT), with optimized duplication approach. The additional feature is introduced for the entry task and is applied only in scenarios in which duplication is more practical and effective. The prioritization phase of the proposed work is based on an optimistic processing time matrix that is used for ranking of the tasks. The algorithm has same time complexity as state-of-the-art existing algorithms, but notable improvements are acquired in terms of makespan and other performance evaluation parameters. Extensive experimental analysis of the proposed algorithm is carried out using synthesized graphs and graphs from the real-world applications. The results prove that MOPT achieves quality schedules with reduced makespans. As communication cost among the tasks grows higher, performance of the proposed algorithm becomes more effective, thus providing the evidence that the MOPT algorithm is well-suited for communication-intensive applications.

  相似文献   

9.
一个调度Fork-Join任务图的新算法   总被引:17,自引:1,他引:16  
刘振英  方滨兴  姜誉  张毅  赵宏 《软件学报》2002,13(4):693-697
任务调度是影响工作站网络效率的关键因素之一.Fork-Join任务图可以代表很多并行结构,但其他已有调度Fork-Join任务图算法忽略了在非全互连工作站网络环境中通信之间不能并行执行的问题,有些效率高的算法又没有考虑节省处理器个数的问题.因此,专门针对该任务图,综合考虑调度长度、非并行通信和节省处理器个数问题,提出了一个基于任务复制的静态调度算法TSA_FJ.通过随机产生任务的执行时间和通信时间,生成了多个Fork-Join任务图,并且采用TSA_FJ算法和其他调度算法对生成的任务图进行调度.结果表明,  相似文献   

10.
Programming models for distributed systems often construct a task graph for the program to be executed on a distributed system of processors. While the topology of the task graph can be constructed from the program structure, often the task execution times and data transfer costs between tasks depend on the input data, or more specifically, on the particular problem instance. Though this indicates that the optimal schedule of a task graph cannot be determined until the input data is available, it is possible to estimate theworst caseprocessor requirement for the optimal schedule of a program solely from the topology of its task graph. In this paper, we study the problem of estimating worst case processor requirements for scheduling (with cloning) layered task graphs based on their topology. We show that computing an accurate processor bound for layered graphs is NP hard (even for two layers) and present a polynomial time algorithm which computes an upper bound on the processor requirement. We show that the algorithm provides tight bounds for several common classes of layered task graphs.  相似文献   

11.

SRAM-based FPGAs feature high performance and flexibility. Thus, they have found many applications in modern high-performance computing (HPC) systems. These systems suffer from the limitation of the computing resources problem for running HPC applications. Therefore, multi-FPGA systems have been emerged to alleviate such resource limitations. In this regard, efficient scheduling strategies are required to dynamically steer the execution of applications—represented as task graphs—on a set of connected FPGAs. In this paper, a heuristic-based dynamic critical path-aware scheduling technique named CPA is presented to schedule task graphs on multi-FPGA systems. The proposed technique, by considering the computation and communication capabilities of FPGAs, dynamically assigns priority to tasks in different steps in order to achieve better makespans. The proposed technique has been evaluated by conducting several experiments on real-world and three different shapes of random task graphs with different number of tasks, and its efficiency has been compared with that of three task graph scheduling approaches. The obtained results demonstrate that the proposed CPA technique outperforms well-known heuristic scheduling strategies and improves their makespan by 13.47% on average. In addition, the experiments show that the proposed technique generates the schedules in the order of milliseconds and the average of its yielded makespans is 12.05% longer than that of an optimum schedule.

  相似文献   

12.
The multiprocessor scheduling problem is the problem of scheduling the tasks of a precedence constrained task graph (representing a parallel program) onto the processors of a multiprocessor in a way that minimizes the completion time. Since this problem is known to be NP-hard in the strong sense in all but a few very restricted eases, heuristic algorithms are being developed which obtain near optimal schedules in a reasonable amount of computation time. We present an efficient heuristic algorithm for scheduling precedence constrained task graphs with nonnegligible intertask communication onto multiprocessors taking contention in the communication channels into consideration. Our algorithm for obtaining satisfactory suboptimal schedules is based on the classical list scheduling strategy. It simultaneously exploits the schedule-holes generated in the processors and in the communication channels during the scheduling process in order to produce better schedules. We demonstrate the effectiveness of our algorithm by comparing with two competing heuristic algorithms available in the literature  相似文献   

13.
Scheduling large task graphs is an important issue in parallel computing. In this paper we tackle the two following problems: (1) how to schedule a task graph, when it is too large to fit into memory? (2) How to build a generic program such that parameter values of a task graph can be given at run-time? Our answers feature the parameterized task graph (PTG), which is a symbolic representation of the task graph. We propose a dynamic scheduling algorithm which takes a PTG as an entry and allows us to generate a generic program. We present a theoretical study which shows that our algorithm finds good schedules for coarse-grain task graphs, has a low memory cost, and a low computational complexity. When the average number of operations of each task is large enough, we prove that the scheduling overhead is negligible with respect to the makespan. We also provide experimental results that demonstrate the feasibility of our approach using several compute-intensive kernels found in numerical scientific applications.  相似文献   

14.
A genetic algorithm for multiprocessor scheduling   总被引:6,自引:0,他引:6  
The problem of multiprocessor scheduling can be stated as finding a schedule for a general task graph to be executed on a multiprocessor system so that the schedule length can be minimized. This scheduling problem is known to be NP-hard, and methods based on heuristic search have been proposed to obtain optimal and suboptimal solutions. Genetic algorithms have recently received much attention as a class of robust stochastic search algorithms for various optimization problems. In this paper, an efficient method based on genetic algorithms is developed to solve the multiprocessor scheduling problem. The representation of the search node is based on the order of the tasks being executed in each individual processor. The genetic operator proposed is based on the precedence relations between the tasks in the task graph. Simulation results comparing the proposed genetic algorithm, the list scheduling algorithm, and the optimal schedule using random task graphs, and a robot inverse dynamics computational task graph are presented  相似文献   

15.
网格中资源之间存在着通信延迟,通过任务复制的冗余,可以减少任务之间的通信开销,缩短整个计算程序的计算时间。目前网格中的任务调度算法基本上是没有考虑任务复制的;而基于任务复制调度算法往往会产生过多的复制任务,增大系统开销,甚至有可能延迟计算时间。由于基于任务复制的任务调度是一个NP问题,因此本文提出了一种基于任务复制的网格资源调度算法,以减少调度长度为主要目标、减少任务复制量和资源占用量为次要目标。该算法在调度长度和任务复制数量以及占用资源数量方面都等于或优于其它算法。  相似文献   

16.
Contention-aware scheduling with task duplication   总被引:1,自引:0,他引:1  
Finding an efficient schedule for a task graph on several processors is a trade-off between maximising concurrency and minimising interprocessor communication. Task duplication is a technique that has been employed to reduce or avoid interprocessor communication. Certain tasks are duplicated on several processors to produce the data locally and avoid the communication among processors. Most of the algorithms using task duplication have been proposed for the classic scheduling model, which allows concurrent communication and ignores contention for communication resources. It is increasingly recognised that this classic model is unrealistic and does not permit creating accurate and efficient schedules. The recently proposed contention model introduces contention awareness into task scheduling by assigning the edges of the task graph to the links of the communication network. It is intuitive that scheduling under such a model benefits even more from task duplication, yet no such algorithm has been proposed as it is not trivial to duplicate tasks under the contention model. This paper proposes a contention-aware task duplication scheduling algorithm. We investigate the fundamentals for task duplication in the contention model and propose an algorithm that is based on state-of-the-art techniques found in task duplication and contention-aware algorithms. An extensive experimental evaluation demonstrates the significant improvements to the speedup of the produced schedules.  相似文献   

17.
On exploiting task duplication in parallel program scheduling   总被引:1,自引:0,他引:1  
One of the main obstacles in obtaining high performance from message-passing multicomputer systems is the inevitable communication overhead which is incurred when tasks executing on different processors exchange data. Given a task graph, duplication-based scheduling can mitigate this overhead by allocating some of the tasks redundantly on more than one processor. In this paper, we focus on the problem of using duplication in static scheduling of task graphs on parallel and distributed systems. We discuss five previously proposed algorithms and examine their merits and demerits. We describe some of the essential principles for exploiting duplication in a more useful manner and, based on these principles, propose an algorithm which outperforms the previous algorithms. The proposed algorithm generates optimal solutions for a number of task graphs. The algorithm assumes an unbounded number of processors. For scheduling on a bounded number of processors, we propose a second algorithm which controls the degree of duplication according to the number of available processors. The proposed algorithms are analytically and experimentally evaluated and are also compared with the previous algorithms  相似文献   

18.
The problem of scheduling non-deterministic graphs arises in several situations in scheduling parallel programs, particularly in the cases of loops and conditional branching. When scheduling loops in a parallel program, non-determinism arises because the number of loop iterations may not be known before the execution of the program. However, since loops from a restricted class of conditional branching, there is a higher degree of non-determinism associated with scheduling conditional branching. In this case, the direction of every branch remains unknown before run time. It follows that entire subprograms of the parallel program may or may not get executed, which in turn increases the amount of non-determinism and complicates the scheduling process. Thus, the term non-determinism is frequently associated with conditional branching in the literature. In this paper, we study the problem of constructing a static schedule for task graphs that contain conditional branching on parallel computers. Generally, it is difficult to obtain optimal solutions for solving various scheduling problems, even in the deterministic case. When non-determinism is added to the scheduling problem through conditional branching, an optimal solution will be even harder to obtain. We start the paper with a brief discussion of the scheduling problem, then we introduce a model for representing parallel programs that contain branches. We present a two-step scheduling technique which employs two different approaches: a graph theoretic appraoch and a multi-phase approach. The first approach is based on exploring several graph theoretic properties of the model. This approach is used as a preprocessing step to decrease the amount of non-determinism before applying the multi-phase approach. In the second step, several execution instances of the program are generated, a schedule for every instance is obtained, and a unified schedule is constructed by merging the obtained schedules. Finally, we report the results of the experiments that we conducted to measure the performance of the techniques introduced in this paper.  相似文献   

19.
This paper presents a hybrid scheduling methodology for task graphs to multiprocessor embedded systems. The proposed methodology is designed for task graphs which are dynamic in nature due to the presence of conditional tasks as well as tasks whose execution times are unpredictable but bounded. We have presented the methodology as a three phase strategy in which task nodes are mapped to the processors in the first (static mapping) phase. In the second (selective duplication) phase some critical nodes are identified and duplicated for possible rescheduling at run-time depending on the code memory constraints of the processors. The third (online) phase is a run-time scheduling algorithm that performs list scheduling based on actual dynamics of the schedule up to the current time. We show that this technique provides better schedule length (up to 20%) compared to previous techniques which are predominantly static in nature with low overhead and comparable in complexity with existing online techniques. The effects of model parameters like number of processors, memory and various task graph parameters on performance are investigated in this paper.  相似文献   

20.
This paper addresses the problem of minimizing the scheduling length (make-span) of a batch of jobs with different arrival times. A job is described by a direct acyclic graph (DAG) of parallel tasks. The paper proposes a dynamic scheduling method that adapts the schedule when new jobs are submitted and that may change the processors assigned to a job during its execution. The scheduling method is divided into a scheduling strategy and a scheduling algorithm. We also propose an adaptation of the Heterogeneous Earliest-Finish-Time (HEFT) algorithm, called here P-HEFT, to handle parallel tasks in heterogeneous clusters with good efficiency without compromising the makespan. The results of a comparison of this algorithm with another DAG scheduler using a simulation of several machine configurations and job types shows that P-HEFT gives a shorter makespan for a single DAG but scores worse for multiple DAGs. Finally, the results of the dynamic scheduling of a batch of jobs using the proposed scheduler method showed significant improvements for more heavily loaded machines when compared to the alternative resource reservation approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号