期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Job scheduling and processor allocation for grid computing on metacomputers

《Journal of Parallel and Distributed Computing》2005,65(11):1406-1418

Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatorial optimization problem. A cost model is proposed for modeling communication heterogeneity on computational grids. A processor allocation algorithm is developed which always finds an optimal processor allocation that minimizes the effective execution time of a job when the job is being scheduled. It is proven that the list scheduling (LS) algorithm can achieve reasonable worst-case performance bound in grid environments supporting distributed supercomputing with large applications. We compare the performance of various job scheduling and processor allocation algorithms for grid computing on metacomputers. We evaluate the performance of 128 combinations of two job scheduling algorithms, four initial job ordering strategies, four processor allocation algorithms, and four metacomputers by extensive simulation. It is found that the combination of largest job first (LJF) initial job ordering and minimum effective execution time (MEET) or largest machine first (LMF) processor allocation algorithm yields the best average-case performance, and the choice of FCFS and LS depends on the range of job sizes. It is also observed that communication heterogeneity does have significant impact on schedule lengths. 相似文献

2.

Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids 总被引：1，自引：0，他引：1

Chtepen M. Claeys F.H.A. Dhoedt B. De Turck F. Demeester P. Vanrolleghem P.A. 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(2):180-190

A grid is a distributed computational and storage environment often composed of heterogeneous autonomously managed subsystems. As a result, varying resource availability becomes commonplace, often resulting in loss and delay of executing jobs. To ensure good grid performance, fault tolerance should be taken into account. Commonly utilized techniques for providing fault tolerance in distributed systems are periodic job checkpointing and replication. While very robust, both techniques can delay job execution if inappropriate checkpointing intervals and replica numbers are chosen. This paper introduces several heuristics that dynamically adapt the above mentioned parameters based on information on grid status to provide high job throughput in the presence of failure while reducing the system overhead. Furthermore, a novel fault-tolerant algorithm combining checkpointing and replication is presented. The proposed methods are evaluated in a newly developed grid simulation environment dynamic scheduling in distributed environments (DSiDE), which allows for easy modeling of dynamic system and job behavior. Simulations are run employing workload and system parameters derived from logs that were collected from several large-scale parallel production systems. Experiments have shown that adaptive approaches can considerably improve system performance, while the preference for one of the solutions depends on particular system characteristics, such as load, job submission patterns, and failure frequency. 相似文献

3.

Job scheduling and dynamic data replication in data grid environment

Najme Mansouri Gholam Hosein Dastghaibyfard 《The Journal of supercomputing》2013,64(1):204-225

Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper, two algorithms are proposed: first, a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that considers the number of jobs waiting in queue, the location of required data for the job, and computational capability; second, a dynamic data replication strategy called Dynamic Hierarchical Replication Algorithm (DHRA) that improves file access time. DHRA stores each replica in an appropriate site, i.e., appropriate site in the requested region that has the highest number of access for that particular replica. Also, it can minimize access latency by selecting the best replica when various sites hold replicas of datasets. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms. 相似文献

4.

Adaptive checkpointing strategy to tolerate faults in economy based grid 总被引：3，自引：2，他引：1

Babar Nazir Kalim Qureshi Paul Manuel 《The Journal of supercomputing》2009,50(1):1-18

In this paper, we develop a fault tolerant job scheduling strategy in order to tolerate faults gracefully in an economy based grid environment. We propose a novel adaptive task checkpointing based fault tolerant job scheduling strategy for an economy based grid. The proposed strategy maintains a fault index of grid resources. It dynamically updates the fault index based on successful or unsuccessful completion of an assigned task. Whenever a grid resource broker has tasks to schedule on grid resources, it makes use of the fault index from the fault tolerant schedule manager in addition to using a time optimization heuristic. While scheduling a grid job on a grid resource, the resource broker uses fault index to apply different intensity of task checkpointing (inserting checkpoints in a task at different intervals). To simulate and evaluate the performance of the proposed strategy, this paper enhances the GridSim Toolkit-4.0 to exhibit fault tolerance related behavior. We also compare “checkpointing fault tolerant job scheduling strategy” with the well-known time optimization heuristic in an economy based grid environment. From the measured results, we conclude that even in the presence of faults, the proposed strategy effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline and allotted budget. It also improves the overall execution time and minimizes the execution cost of grid jobs. 相似文献

5.

网格计算服务系统检查点算法研究

张至柔《计算机工程与设计》2008,29(14)

在由机构内部空闲计算机组成的为计算移动Agent提供服务的网格计算服务系统中减少容错开销,提高计算效率是一个重要的问题.一个具有非封闭、非阻塞、低开销等优势的新检查点算法被提出,且该算法的同步垃圾收集过程可以避免不同进程间在确立新检查点、抛弃旧检查点时的不同步造成的不一致状态.实验结果表明,该算法的开销与系统节点数量呈线性关系. 相似文献

6.

Maximizing availability for task scheduling in computational grid using genetic algorithm

Shiv Prakash Deo Prakash Vidyarthi 《Concurrency and Computation》2015,27(1):193-210

Computational grid provides a wide distributed platform for high‐end compute intensive applications. Grid scheduling is often carried out to schedule the submitted jobs on the nodes of the grid so that some characteristic parameter is optimized. Availability of the computational nodes is one of the important characteristic parameters and measures the probability of the node availability for job execution. This paper addresses the availability of the grid computational nodes for the job execution and proposes a model to maximize it. As such, the task scheduling problem in grid is nondeterministic polynomial‐time hard, and often, metaheuristics techniques are applied to solve it. Genetic algorithm, a metaheuristic technique based on evolutionary computation, has been used to solve such complex optimization problem. This work proposes a technique for the grid scheduling problem using genetic algorithm with the objective to maximize availability. Simulation experiment, to evaluate the performance of the proposed algorithm, is conducted, and results reveal the effectiveness of the model. A comparative study has also been performed. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

7.

Parallel machine scheduling problems using memetic algorithms 总被引：2，自引：0，他引：2

Runwei Cheng Mitsuo Gen 《Computers & Industrial Engineering》1997,33(3-4):761-764

In this paper, we investigate how to apply the hybrid genetic algorithms (the memetic algorithms) to solve the parallel machine scheduling problem. There are two essential issues to be dealt with for all kinds of parallel machine scheduling problems: job partition among machines and job sequence within each machine. The basic idea of the proposed method is that (a) use the genetic algorithms to evolve the job partition and then (b) apply a local optimizer to adjust the job permutation to push each chromosome climb to his local optima. Preliminary computational experiments demonstrate that the hybrid genetic algorithm outperforms the genetic algorithms and the conventional heuristics. 相似文献

8.

A multi-dimensional scheduling scheme in a Grid computing environment

B.T. Benjamin Khoo Bharadwaj Veeravalli Terence Hung C.W. Simon See 《Journal of Parallel and Distributed Computing》2007

In this paper, we propose a novel distributed resource-scheduling algorithm capable of handling multiple resource requirements for jobs that arrive in a Grid computing environment. In our proposed algorithm, referred to as multiple resource scheduling (MRS) algorithm, we take into account both the site capabilities and the resource requirements of jobs. The main objective of the algorithm is to obtain a minimal execution schedule through efficient management of available Grid resources. We first propose a model in which the job and site resource characteristics can be captured together and used in the scheduling algorithm. To do so, we introduce the concept of a n-dimensional virtual map and resource potential. Based on the proposed model, we conduct rigorous simulation experiments with real-life workload traces reported in the literature to quantify the performance. We compare our strategy with most of the commonly used algorithms in place on performance metrics such as job wait times, queue completion times, and average resource utilization. Our combined consideration of job and resource characteristics is shown to render high-performance with respect to above-mentioned metrics in the environment. Our study also reveals the fact that MRS scheme has a capability to adapt to both serial and parallel job requirements, especially when job fragmentation occurs. Our experimental results clearly show that MRS outperforms other strategies and we highlight the impact and importance of our strategy. 相似文献

9.

一种自适应的动态网格任务调度算法 总被引：1，自引：0，他引：1

张秋余柴进《计算机应用》2006,26(10):2267-2269

GRACE网格资源框架是一个分布式、可计算的经济学体系框架,针对框架中分配网格资源问题,引入近视算法,提出了一种自适应的动态网格任务调度算法。该算法通过在调度过程中动态监测系统的负载平衡度,自适应地选择任务调度策略。经模拟试验证明,该调度算法提高了任务的调度成功率。相似文献

10.

Security‐aware scheduling model for computational grid

Rekha Kashyap Deo Prakash Vidyarthi 《Concurrency and Computation》2012,24(12):1377-1391

Grid applications with stringent security requirements introduce challenging concerns because the schedule devised by nonsecurity‐aware scheduling algorithms may suffer in scheduling security constraints tasks. To make security‐aware scheduling, estimation and quantification of security overhead is necessary. The proposed model quantifies security, in the form of security levels, on the basis of the negotiated cipher suite between task and the grid‐node and incorporates it into existing heuristics MinMin and MaxMin to make it security‐aware MinMin(SA) and MaxMin(SA). It also proposes SPMaxMin (Security Prioritized MinMin) and its comparison with three heuristics MinMin(SA), MaxMin(SA), and SPMinMin on heterogeneous grid/task environment. Extensive computer simulation results reveal that the performance of the various heuristics varies with the variation in computational and security heterogeneity. Its analysis over nine heterogeneous grid/task workload situations indicates that an algorithm that performs better for one workload degrades in another. It is conspicuous that for a particular workload one algorithm gives better makespan while another gives better response time. Finally, a security‐aware scheduling model is proposed, which adapts itself to the dynamic nature of the grid and picks the best suited algorithm among the four analyzed heuristics on the basis of job characteristics, grid characteristics, and desired performance metric. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

11.

混合型实时容错调度算法的设计和性能分析 总被引：17，自引：2，他引：15

秦啸韩宗芬庞丽萍李胜利《软件学报》2000,11(5):686-693

以往文献中研究的实时容错调度算法都只能调度单一的具有容错需求的任务.该文建立了一个混合型实时容错调度模型,提出一种静态实时容错调度算法.该算法能同时调度具有容错需求的实时任务和无容错需求的实时任务.该文还提出了一个求解最小处理机个数的算法,用于对静态实时容错调度算法的性能进行模拟分析.为了提高静态调度算法的调度性能,提出了一种动态调度算法.最后,通过模拟实验分析了静态和动态调度算法的性能.实验表明,调度算法的性能与实时任务的个数、任务的计算时间、周期和处理机个数等系统参数相关. 相似文献

12.

基于PVM的准同步检查点设置方法

张宇张玉芳《计算机工程与设计》2006,27(3):494-496

检查点是并行系统中实现容错的重要手段,同步检查点方法已广泛应用在工作站机群系统中。PVM所提供的消息传递机制支持高效的异构网络计算,但不支持客错功能。为了降低同步检查点设置的时间开销,提出了一种基于PVM的准同步检查点设置方法,它吸取了同步检查点方法的优点,又通过消息记录方式实现各节点间独立进行状态保存,大大降低了检查点的同步开销,提高了检查点操作效率,该方法在PVM环境下得以实现,实验结果表明所提出的方法具有较好的客错性能。相似文献

13.

一种基于索引的准同步检查点协议 总被引：3，自引：0，他引：3

罗元盛闵应骅张大方《计算机学报》2005,28(10):1620-1625

在基于索引的分布式检查点算法中,尽量减少全局一致性检查点和强制检查点的数目对提高计算效率具有重要意义．该文在已有的基于索引的检查点算法的基础上,提出了一种新的检查点协议,既减少检查点的数目,又使各个进程的检查点之间实时同步,以免程序出错后回卷执行的开销太大,丢失过多有效计算．模拟实验表明,按该文所提协议,平均每条消息导致的强制检查点数比传统方法平均减少23．2％．相似文献

14.

An ant algorithm for balanced job scheduling in grids 总被引：1，自引：1，他引：0

Ruay-Shiung Jih-Sheng Po-Sheng 《Future Generation Computer Systems》2009,25(1):20-27

Grid computing utilizes the distributed heterogeneous resources in order to support complicated computing problems. Grid can be classified into two types: computing grid and data grid. Job scheduling in computing grid is a very important problem. To utilize grids efficiently, we need a good job scheduling algorithm to assign jobs to resources in grids.In the natural environment, the ants have a tremendous ability to team up to find an optimal path to food resources. An ant algorithm simulates the behavior of ants. In this paper, we propose a Balanced Ant Colony Optimization (BACO) algorithm for job scheduling in the Grid environment. The main contributions of our work are to balance the entire system load while trying to minimize the makespan of a given set of jobs. Compared with the other job scheduling algorithms, BACO can outperform them according to the experimental results. 相似文献

15.

Parallel-machine scheduling to minimize tardiness penalty and power cost

Kuei-Tang Fang Bertrand M.T. Lin 《Computers & Industrial Engineering》2013,64(1):224-234

Traditional research on machine scheduling focuses on job allocation and sequencing to optimize certain objective functions that are defined in terms of job completion times. With regard to environmental concerns, energy consumption becomes another critical issue in high-performance systems. This paper addresses a scheduling problem in a multiple-machine system where the computing speeds of the machines are allowed to be adjusted during the course of execution. The CPU adjustment capability enables the flexibility for minimizing electricity cost from the energy saving aspect by sacrificing job completion times. The decision of the studied problem is to dispatch the jobs to the machines as well as to determine the job sequence and processing speed of each machine with the objective function comprising of the total weighted job tardiness and the power cost. We give a formal formulation, propose two heuristic algorithms, and develop a particle swarm optimization (PSO) algorithm to effectively tackle the problem. Since the existing solution representations do not befittingly encode the decisions involved in the studied problem into the PSO algorithm, we design a tailored encoding scheme which can embed all decisional information in a particle. A computational study is conducted to investigate the performances of the proposed heuristics and the PSO algorithm. 相似文献

16.

网格环境下基于启发式智能算法的任务调度研究

贺智明曹海霞《现代计算机》2006,(1):4-7

网格环境中的资源和任务情况异常复杂,因此计算任务在各种资源之间的调度成为了一个关键的问题,启发式智能算法被证明是解决这类问题的有效算法.本文提出将遗传算法和改进的蚂蚁算法融合起来解决网格环境下的任务调度. 相似文献

17.

网格市场中一种模糊决策的多维QoS批调度方法

武斌杨寿保徐婧刘晓茜《小型微型计算机系统》2009,30(12)

网格市场环境下,用户的服务质量(QoS)需求更加多样化;更多普通用户加入网格市场,难以提供精确的QoS需求信息.因此,基于用户模糊QoS需求的调度算法成为网格市场中研究的热点.多维QoS网格调度的形式化描述,利用模糊决策理论有效地将用户模糊的QoS需求的映射到网格资源,利用AHP算法确定用户关于多维QoS各维度之间的权重关系,给出一种模糊决策的多维QoS的调度方法.实验表明,模糊决策的多维QoS批调度算法在不需要用户提供精确的QoS参数前提下,有效满足用户QoS需求.与现有的QoS批调度方法相比,该算法具有较好的一次作业完成率,且作业完成率波动较小. 相似文献

18.

Adaptive checkpointing in message passing distributed systems

ROBERTO BALDON JEAN-MICHEL HELARYI ACHOUR MOSTEFAOUI MICHEL RAYNAL 《International journal of systems science》2013,44(11):1145-1161

Determining consistent global checkpoints is common to many distributed problems such as fault-tolerance, distributed debugging, properties detection, etc. Uncoordinated and coordinated checkpointing algorithms have been traditionally used for such determinations. This paper addresses a third technique, namely adaptive checkpointing, that has recently emerged. This technique assumes processes take local checkpoints independently and requires them to take additional local checkpoints in order that all local checkpoints be members of some consistent global checkpoint. We first study the characteristics of such adaptive algorithms. Then, a general adaptive checkpointing algorithm is designed from a condition, first stated by Netzer and Xu, that answers the following question: ‘does a given local checkpoint belong to a consistent global checkpoint’' (such a local checkpoint is not useless). The resulting algorithm has the nice property to reduce the number of additional local checkpoints taken to ensure the property ‘no local checkpoint is useless’. Futhermore, it provides each local checkpoint with a consistent global checkpoint to which it belongs. Compared to uncoordinated and coordinated checkpointing algorithms, this algorithm combines the advantages of both without inheriting their drawbacks. 相似文献

19.

计算网格中的资源选择与调度算法 总被引：3，自引：0，他引：3

李玺胡志刚《计算机工程与应用》2005,41(34):117-119,206

针对文中描述的计算网格资源环境模型,构造了一种分布式的层次型任务调度模型,任务调度分为计算资源站点的选择以及资源站点内部的本地调度两层进行。通过研究该调度模型,提出了一种基于双目标衡量函数的资源选择算法,该算法可以通过设置相关参数动态调节响应时间和价格在总目标中所占比重。试验结果表明能够选择综合满足响应时间和价格这两个目标的计算资源,以适应用户的不同需求。相似文献

20.

A checkpointed league championship algorithm-based cloud scheduling scheme with secure fault tolerance responsiveness

《Applied Soft Computing》2017

In its simplest structure, cloud computing technology is a massive collection of connected servers residing in a datacenter and continuously changing to provide services to users on-demand through a front-end interface. The failure of task during execution is no more an accident but a frequent attribute of scheduling systems in a large-scale distributed environment. Recently, some computational intelligence techniques have been mostly utilized to decipher the problems of scheduling in the cloud environment, but only a few emphasis on the issue of fault tolerance. This research paper puts forward a Checkpointed League Championship Algorithm (CPLCA) scheduling scheme to be used in the cloud computing system. It is a fault-tolerance aware task scheduling mechanisms using the checkpointing strategy in addition to tasks migration against unexpected independent task execution failure. The simulation results show that, the proposed CPLCA scheme produces an improvement of 41%, 33% and 23% as compared with the Ant Colony Optimization (ACO), Genetic Algorithm (GA) and the basic league championship algorithm (LCA) respectively as parametrically measured using the total average makespan of the schemes. Considering the total average response time of the schemes, the CPLCA scheme produces an improvement of 54%, 57% and 30% as compared with ACO, GA and LCA respectively. It also turns out significant failure decrease in jobs execution as measured in terms of failure metrics and performance improvement rate. From the results obtained, CPLCA provides an improvement in both tasks scheduling performance and failure awareness that is more appropriate for scheduling in the cloud computing model. 相似文献