首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 771 毫秒
1.
For fine grain task graphs, duplication-based scheduling algorithms are generally more efficient than list and cluster-based algorithms. However, most duplication-based heuristics try to duplicate all possible ancestor nodes of a given join node, in order to reduce the earliest start time (EST) of the join node, even though these ancestor nodes have already been allocated in previous steps. Thus, these duplication heuristics inevitably induce redundant duplications, which lead to the superfluous consumption of resources and generally deteriorate the scheduling result in the case of a bounded number of processors. When scheduling algorithms are used on an unbounded number of processors, the required number of processors grows excessively with the size of the task graph, thereby limiting the practicality of these algorithms for large task graphs. In this paper, we propose a novel algorithm designed to allocate join nodes without redundant duplications. In the proposed algorithm, if the ancestor nodes of a join node are duplicated when scheduling the join node, the original allocations of these ancestor nodes are removed using a very efficient method. The performance of the proposed algorithm, in terms of its normalized schedule length and efficiency, is compared with that of some of the recently proposed algorithms. The proposed algorithm generates better or comparable schedules with minimized duplication. Specifically, the simulation results show that it is most useful on a bounded number of processors.  相似文献   

2.
In the area of parallelizing compilers, considerable research has been carried out on data dependency analysis, parallelism extraction, as well as program and data partitioning. However, designing a practical, low complexity scheduling algorithm without sacrificing performance remains a challenging problem. A variety of heuristics have been proposed to generate efficient solutions but they take prohibitively long execution times for moderate size or large problems. In this paper, we propose an algorithm called FASTEST (Fast Assignment and Scheduling of Tasks using an Efficient Search Technique) that has O(e) time complexity, where e is the number of edges in the task graph. The algorithm first generates an initial solution in a short time and then refines it by using a simple but robust random neighborhood search. We have also parallelized the search to further lower the time complexity. We are using the algorithm in a prototype automatic parallelization and scheduling tool which compiles sequential code and generates parallel code optimized with judicious scheduling. The proposed algorithm is evaluated with several application programs and outperforms a number of previous algorithms by generating parallelized code with shorter execution times, while taking dramatically shorter scheduling times. The FASTEST algorithm generates optimal solutions for a majority of the test cases and close-to-optimal solutions for the rest  相似文献   

3.
On parallelizing the multiprocessor scheduling problem   总被引:1,自引:0,他引:1  
Existing heuristics for scheduling a node and edge weighted directed task graph to multiple processors can produce satisfactory solutions but incur high time complexities, which tend to exacerbate in more realistic environments with relaxed assumptions. Consequently, these heuristics do not scale well and cannot handle problems of moderate sizes. A natural approach to reducing complexity, while aiming for a similar or potentially better solution, is to parallelize the scheduling algorithm. This can be done by partitioning the task graphs and concurrently generating partial schedules for the partitioned parts, which are then concatenated to obtain the final schedule. The problem, however, is nontrivial as there exists dependencies among the nodes of a task graph which must be preserved for generating a valid schedule. Moreover, the time clock for scheduling is global for all the processors (that are executing the parallel scheduling algorithm), making the inherent parallelism invisible. In this paper, we introduce a parallel algorithm that is guided by a systematic partitioning of the task graph to perform scheduling using multiple processors. The algorithm schedules both the tasks and messages, and is suitable for graphs with arbitrary computation and communication costs, and is applicable to systems with arbitrary network topologies using homogeneous or heterogeneous processors. We have implemented the algorithm on the Intel Paragon and compared it with three closely related algorithms. The experimental results indicate that our algorithm yields higher quality solutions while using an order of magnitude smaller scheduling times. The algorithm also exhibits an interesting trade-off between the solution quality and speedup while scaling well with the problem size  相似文献   

4.
Scheduling precedence constrained task graphs, with or without duplication, is one of the most challenging NP-complete problems in parallel and distributed computing systems. Duplication heuristics are more effective, in general, for fine grain task graphs and for networks with high communication latencies. However, most of the available duplication algorithms are designed under the assumption of unbounded availability of fully connected processors, and lie in a high complexity range. Low complexity optimal duplication algorithms work under restricted cost and/or shape parameters for the task graphs. Further, the required number of processors grows in proportion to the task-graph size significantly. An improved duplication strategy is proposed that works for arbitrary task graphs, with a limited number of interconnection-constrained processors. Unlike most other algorithms that replicate all possible parents/ancestors of a given task, the proposed algorithm tends to avoid redundant duplications and duplicates the nodes selectively, only if it helps in improving the performance. This results in lower duplications and also lower time and space complexity. Simulation results are presented for clique and an interconnection-constrained network topology with random and regular benchmark task graph suites, representing a variety of parallel numerical applications. Performance, in terms of normalized schedule length and efficiency, is compared with some of the well-known and recently proposed algorithms. The suggested algorithm turns out to be most efficient, as it generates better or comparable schedules with remarkably less processor consumption.  相似文献   

5.
We propose new models for competitive facility location and pricing as bilevel Boolean linear programming problems. We obtain results that characterize the complexity of the problem where a monopolist’s profit on each of the markets is defined with a monotone nonincreasing function of the servicing cost. For this problem, we also propose two approximate algorithms based on the ideas of alternating heuristics and local search. We give results of a computational experiment that show a possibility for fast computation of approximate solutions.  相似文献   

6.
Allocating tasks to processors is a well-known NP-Hard problem in distributed computing systems. Due to the lack of practicable exact solutions, it has been attracted by the researchers working on heuristic-based suboptimal search algorithms. With the recent inclusion of multiple objectives such as minimizing the cost, maximizing the throughput and maximizing the reliability, the problem gets even more complex and an efficient approximate method becomes more valuable. In this work, I propose a new solution for the multi-objective task allocation problem. My solution consists in designing a problem-specific neighboring function for an existing metaheuristic algorithm that is proven to be successful in quadratic assignment problems. The neighboring function, namely greedy reassignment with maximum release (GR-MR), provides a dynamic mechanism to switch the preference of the search between the exploration and exploitation. The experiments validate both that the quality of the solutions are close to the optimal and the proposed method performs significantly better comparing to three other metaheuristic algorithms. Neighboring functions being the common reusable components of metaheuristic algorithms, GR-MR can also be utilized by other metaheuristic-based solutions in the future.  相似文献   

7.
On exploiting task duplication in parallel program scheduling   总被引:1,自引:0,他引:1  
One of the main obstacles in obtaining high performance from message-passing multicomputer systems is the inevitable communication overhead which is incurred when tasks executing on different processors exchange data. Given a task graph, duplication-based scheduling can mitigate this overhead by allocating some of the tasks redundantly on more than one processor. In this paper, we focus on the problem of using duplication in static scheduling of task graphs on parallel and distributed systems. We discuss five previously proposed algorithms and examine their merits and demerits. We describe some of the essential principles for exploiting duplication in a more useful manner and, based on these principles, propose an algorithm which outperforms the previous algorithms. The proposed algorithm generates optimal solutions for a number of task graphs. The algorithm assumes an unbounded number of processors. For scheduling on a bounded number of processors, we propose a second algorithm which controls the degree of duplication according to the number of available processors. The proposed algorithms are analytically and experimentally evaluated and are also compared with the previous algorithms  相似文献   

8.
Network of workstation (NOW) is a cost-effective alternative to massively parallel supercomputers. As commercially available off-the-shelf processors become cheaper and faster, it is now possible to build a cluster that provides high computing power within a limited budget. However, a cluster may consist of different types of processors and this heterogeneity complicates the design of efficient collective communication protocols. For example, it is a very hard combinatorial problem to find an optimal reduction schedule for such heterogeneous clusters. Nevertheless, we show that a simple technique called slowest-node-first (SNF) is very effective in designing efficient reduction protocols for heterogeneous clusters. First, we show that SNF is actually a 2-approximation algorithm, which means that an SNF schedule length is always within twice of the optimal schedule length, no matter what kind of cluster is given. In addition, we show that SNF does give the optimal reduction time when the cluster consists of two types of processors, when the ratio of communication speed between them is at least two. When the communication speed ratio is less than two, we develop a dynamic programming technique to find the optimal schedule. Our dynamic programming utilizes the monotone property of the objective function, and can significantly reduce the amount of computation time. Finally, combined with an approximation algorithm for broadcast 2004, we propose an all-reduction algorithm which sends the reduction answer to all processors, with approximation ratio 3.5. We conduct three groups of experiments. First, we show that SNF performs better than the built-in MPI_Reduce in a test cluster. Second, we observe a factor of 93 times saving in computation time to find the optimal schedule, when compared with a naive dynamic programming implementation. Thirdly, we apply the theoretical results to a branch-and-bound search and show that they can reduce the search time of the optimal reduction schedule by a factor of 500, when the cluster has three kinds of processors.  相似文献   

9.
发现关联规则是数据挖掘的一项重要任务,本文介绍了几种数据挖掘的串行和并行算法。其中IDD算法是一种高效的和易于扩展的发现关联规则的并行算法,然而,当处理嚣数目增加时,由于负载的失衡导致其效率的严重下降,于是通过引入近似算法成功地解决了这个问题。我们给出了两种近似算法和其性能证明,其一是在线算法,另一种是离线算法。在本文的最后,我们进行了改进的IDD算法的复杂性分析。  相似文献   

10.
Computing an optimal solution to the knapsack problem is known to be NP-hard. Consequently, fast parallel algorithms for finding such a solution without using an exponential number of processors appear unlikely. An attractive alternative is to compute an approximate solution to this problem rapidly using a polynomial number of processors. In this paper, we present an efficient parallel algorithm for finding approximate solutions to the 0–1 knapsack problem. Our algorithm takes an , 0 < < 1, as a parameter and computes a solution such that the ratio of its deviation from the optimal solution is at most a fraction of the optimal solution. For a problem instance having n items, this computation uses O(n5/2/3/2) processors and requires O(log3n + log2nlog(1/)) time. The upper bound on the processor requirement of our algorithm is established by reducing it to a problem on weighted bipartite graphs. This processor complexity is a significant improvement over that of other known parallel algorithms for this problem.  相似文献   

11.
一个调度Fork-Join任务图的最优算法   总被引:5,自引:0,他引:5  
Fork-Join任务图是一种并行处理的基本结构.虽然许多算法在任务满足某些条件时能产生最优调度,但往往没有考虑节省处理器个数和减少任务集的总完成时间,从而降低算法的加速比和效率.因此,提出一种基于任务复制的平衡调度算法,其时间复杂度为O(vq+vlogv),v和q分别表示任务集中任务的个数和使用的处理器个数.通过分析已用处理器的负载和空闲时间段,把任务尽量分配到已用的处理器上以均衡负载,从而提高其利用率.实验结果表明,该算法的加速比和总体效率优于其他算法.因此,该算法对于高性能应用程序的调度是一个较好的选择.  相似文献   

12.
Effective task scheduling is essential for obtaining high performance in heterogeneous distributed computing systems (HeDCSs). However, finding an effective task schedule in HeDCSs requires the consideration of both the heterogeneity of processors and high interprocessor communication overhead, which results from non-trivial data movement between tasks scheduled on different processors. In this paper, we present a new high-performance scheduling algorithm, called the longest dynamic critical path (LDCP) algorithm, for HeDCSs with a bounded number of processors. The LDCP algorithm is a list-based scheduling algorithm that uses a new attribute to efficiently select tasks for scheduling in HeDCSs. The efficient selection of tasks enables the LDCP algorithm to generate high-quality task schedules in a heterogeneous computing environment. The performance of the LDCP algorithm is compared to two of the best existing scheduling algorithms for HeDCSs: the HEFT and DLS algorithms. The comparison study shows that the LDCP algorithm outperforms the HEFT and DLS algorithms in terms of schedule length and speedup. Moreover, the improvement in performance obtained by the LDCP algorithm over the HEFT and DLS algorithms increases as the inter-task communication cost increases. Therefore, the LDCP algorithm provides a practical solution for scheduling parallel applications with high communication costs in HeDCSs.  相似文献   

13.
Consider directed acyclic graph (DAG) scheduling for a large heterogeneous system, which consists of processors with varying processing capabilities and network links with varying bandwidths. The search space of possible task schedules for this problem is immense. One possible approach for this optimization problem, which is NP-hard, is to start with the best task schedule found by a fast deterministic task scheduling algorithm and then iteratively attempt to improve the task schedule by employing a general random guided search method. However, such an approach can lead to extremely long search times, and the solutions found are sometimes not significantly better than those found by the original deterministic task scheduling algorithm. In this paper, we propose an alternative strategy, termed Push-Pull, which starts with the best task schedule found by a fast deterministic task scheduling algorithm and then iteratively attempts to improve the current best solution using a deterministic guided search method. Our simulation results show that given similar runtimes, the Push-Pull algorithm performs well, achieving results similar to or better than all of the other algorithms being compared.  相似文献   

14.
The Garey–Johnson algorithm is a well known polynomial-time algorithm constructing an optimal schedule for the maximum lateness problem with unit execution time tasks, two parallel identical processors, precedence constraints and release times. The paper is concerned with the worst-case analysis of a generalization of the Garey–Johnson algorithm to the case of arbitrary number of processors. In contrast to other algorithms for the maximum lateness problem, the tight performance guarantee for the even number of processors differs from the tight performance guarantee for the odd number of processors.  相似文献   

15.
Cloud computing has been attracting considerable attention since the last decade. This study considers a decision problem formulated from the use of computing services over the Internet. An agent receives orders of computing tasks from his/her clients and on the other hand he/she acquires computing resources from computing service providers to fulfill the requirements of the clients. The processors are bundled as packages according to their speeds and the business strategies of the providers. The packages are rated at a certain pricing scheme to provide flexible purchasing options to the agent. The decision of the agent is to select the packages which can be acquired from the service providers and then schedule the tasks of the clients onto the processors of the acquired packages such that the total cost, including acquisition cost and scheduling cost (total weighted tardiness), is minimized. In this study, we present an integer programming model to formulate the problem and propose several solution methods to produce acquisition and scheduling plans. Ten well-known heuristics of parallel-machine scheduling are adapted to fit into the studied problem so as to provide initial solutions. Tabu search and genetic algorithm are tailored to reflect the problem nature for improving upon the initial solutions. We conduct a series of computational experiments to evaluate the effectiveness and efficiency of all the proposed algorithms. The results of the numerical experiments reveal that the proposed tabu search and genetic algorithm can attain significant improvements.  相似文献   

16.
具备可重配置流水线总线的线性阵列LARPBS(1inear arrays with a reconfigurable pipelined bus systems)是近来出现的一种高效的并行计算模型.与理想的PRAM模型不同.LARPBS是现实可行的。基于LARPBS模型,Y.Pan介绍了2种宽度和精度任意的数据项的最大值查找算法:算法1使用了N^2/2个处理机、O(1)时间,它是目前时间最优的算法;算法2使用了N个处理机、O(loglogN)时间。本文介绍了2种最大值查找算法.时间复杂度同Y.Pan的算法,但所用处理机数减少了一半.这是对Y.Pan算法的重要改进。  相似文献   

17.
The problem of maximal clique enumeration (MCE) is to enumerate all of the maximal cliques in a graph. Once enumerated, maximal cliques are widely used to solve problems in areas such as 3-D protein structure alignment, genome mapping, gene expression analysis, and detection of social hierarchies. Even the most efficient serial MCE algorithms require large amounts of time to enumerate the maximal cliques in networks arising from these problems that contain hundreds, thousands, or larger numbers of vertices. The previous attempts to provide practical solutions to the MCE problem through parallel implementation have had limited success, largely due to a number of challenges inherent to the nature of the MCE combinatorial search space. On the one hand, MCE algorithms often create a backtracking search tree that has a highly irregular and hard-or-impossible to predict structure; therefore, almost any static decomposition of the search tree by parallel processors results in highly unbalanced processor execution times. On the other hand, the data-intensive nature of the MCE problem often makes naive dynamic load distribution strategies that require extensive data movement prohibitively expensive. As a result, good scaling of the overall execution time of parallel MCE algorithms has been reported for only up to a couple hundred processors. In this paper, we propose a parallel, scalable, and memory-efficient MCE algorithm for distributed and/or shared memory high performance computing architectures, whose runtime scales linearly for thousands of processors on real-world application graphs with hundreds and thousands of nodes. Its scalability and efficiency are attributed to the proposed: (a) representation of the search tree decomposition to enable parallelization; (b) parallel depth-first backtracking search to both constrain the search space and minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing intelligently coupled with work stack splitting to minimize computing elements’ idle time. To the best of our knowledge, the proposed parallel MCE algorithm is the first to achieve a linear scaling runtime using up to 2048 processors on Cray XT machines for a number of real-world biological networks.  相似文献   

18.
Abstract: Many real‐world visual tracking applications have a high dimensionality, i.e. the system state is defined by a large number of variables. This kind of problem can be modelled as a dynamic optimization problem, which involves dynamic variables whose values change in time. Most applied research on optimization methods have focused on static optimization problems but these static methods often lack explicit adaptive methodologies. Heuristics are specific methods for solving problems in the absence of an algorithm for formal proof. Metaheuristics are approximate optimization methods which have been applied to more general problems with significant success. However, particle filters are Monte Carlo algorithms which solve the sequential estimation problem by approximating the theoretical distributions in the state space by simulated random measures called particles. However, particle filters lack efficient search strategies. In this paper, we propose a general framework to hybridize heuristics/metaheuristics with particle filters properly. The aim of this framework is to devise effective hybrid visual tracking algorithms naturally, guided by the use of abstraction techniques. Resulting algorithms exploit the benefits of both complementary approaches. As a particular example, a memetic algorithm particle filter is derived from the proposed hybridization framework. Finally, we show the performance of the memetic algorithm particle filter when it is applied to a multiple object tracking problem.  相似文献   

19.
In order to minimize the execution time of a parallel application running on a heterogeneously distributed computing system, an appropriate mapping scheme is needed to allocate the application tasks to the processors. The general problem of mapping tasks to machines is a well‐known NP‐hard problem and several heuristics have been proposed to approximate its optimal solution. In this paper we propose a static graph‐based mapping algorithm, called Heterogeneous Multi‐phase Mapping (HMM), which permits suboptimal mapping of a parallel application onto a heterogeneous computing distributed system by using a local search technique together with a tabu search meta‐heuristic. HMM allocates parallel tasks by exploiting the information embedded in the parallelism forms used to implement an application, and considering an affinity parameter, that identifies which machine in the heterogeneous computing system is most suitable to execute a task. We compare HMM with some leading techniques and with an exhaustive mapping algorithm. We also give an example of mapping of two real applications using HMM. Experimental results show that HMM performs well demonstrating the applicability of our approach. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

20.
Efficient task scheduling on heterogeneous distributed computing systems (HeDCSs) requires the consideration of the heterogeneity of processors and the inter-processor communication. This paper presents a two-phase algorithm, called H2GS, for task scheduling on HeDCSs. The first phase implements a heuristic list-based algorithm, called LDCP, to generate a high quality schedule. In the second phase, the LDCP-generated schedule is injected into the initial population of a customized genetic algorithm, called GAS, which proceeds to evolve shorter schedules. GAS employs a simple genome composed of a two-dimensional chromosome. A mapping procedure is developed which maps every possible genome to a valid schedule. Moreover, GAS uses customized operators that are designed for the scheduling problem to enable an efficient stochastic search. The performance of each phase of H2GS is compared to two leading scheduling algorithms, and H2GS outperforms both algorithms. The improvement in performance obtained by H2GS increases as the inter-task communication cost increases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号