期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A novel approach to resource scheduling for parallel query processing on computational grids

Anastasios Gounaris Rizos Sakellariou Norman W. Paton Alvaro A. A. Fernandes 《Distributed and Parallel Databases》2006,19(2-3):87-106

Advances in network technologies and the emergence of Grid computing have both increased the need and provided the infrastructure for computation and data intensive applications to run over collections of heterogeneous and autonomous nodes. In the context of database query processing, existing parallelisation techniques cannot operate well in Grid environments because the way they select machines and allocate tasks compromises partitioned parallelism. The main contribution of this paper is the proposal of a low-complexity, practical resource selection and scheduling algorithm that enables queries to employ partitioned parallelism, in order to achieve better performance in a Grid setting. The evaluation results show that the scheduler proposed outperforms current techniques without sacrificing the efficiency of resource utilisation. Recommended by: Ioannis Vlahavas 相似文献

2.

Scheduling with due date assignment under special conditions on job processing

Valery Gordon Vitaly Strusevich Alexandre Dolgui 《Journal of Scheduling》2012,15(4):447-456

We review the results on scheduling with due date assignment under such conditions on job processing as given precedence constraints, maintenance activity or various scenarios of processing time changing. The due date assignment and scheduling problems arise in production planning when the management is faced with setting realistic due dates for a number of jobs. Most research on scheduling with due date assignment is focused on optimal sequencing of independent jobs. However, it is often found in practice that some products are manufactured in a certain order implied, for example, by technological, marketing or assembly requirements and this can be modeled by imposing precedence constraints on the set of jobs. In classical deterministic scheduling models, the processing conditions, including job processing times, are usually viewed as given constants. In many real-life situations, however, the processing conditions may vary over time, thereby affecting actual durations of jobs. In the models with controllable processing times, the scheduler can speed up job execution times by allocating some additional resources to the jobs. In the models with deterioration or learning, the actual processing time can depend either on the position or on the start time of a job in the schedule. In scheduling with deterioration, the later a job starts, the longer it takes to process, while in scheduling with learning, the actual processing time of a job gets shorter, provided that the job is scheduled later. We consider also scheduling models with optional maintenance activity. In manufacturing processing, production scheduling with preventive maintenance planning is one of the most significant methods in preventing the machinery from failure or wear. 相似文献

3.

计算密集型与数据密集型混合网格作业调度算法

郝永生卢俊文刘冠峰温娜《计算机工程与科学》2014,36(8):1423-1429

针对计算密集型作业与数据密集型作业混合情况,在一个作业有时间限制的动态环境中,对传统的网格作业调度方法进行扩展,提出了三种网格作业调度启发式算法:Emin min、Ebest、Esufferage。并在一个由多个Cluster组成的、通过高速网络连接的网格模型上,对三种算法进行验证。与Min min算法的比较结果显示：三种算法均优于Min min算法。与ASJS算法比较结果显示：Emin min减少了等待时间与作业的makespan; Esufferage算法以减少作业完成量为代价,减少了作业的等待时间及makespan; Ebest在完成作业数量上与ASJS基本保持一致,但却增加了作业的等待时间与makespan。总体上,Emin min具有比较大的优势。相似文献

4.

基于延迟调度策略的Reduce调度优化算法

石义龙林泓李玉强王彦《计算机应用研究》2017,34(7)

在大规模的Hadoop集群中,良好的任务调度策略对提高数据本地性、减小网络传输开销、减少作业执行时间以及提高集群的作业吞吐量都有着重要的影响。本文针对Hadoop架构中Reduce任务的数据本地性较低问题,提出了一种基于延迟调度策略的Reduce任务调度优化算法,通过提高Reduce任务的数据本地性来减少作业执行时间以及提高作业吞吐量,该算法在Hadoop架构的Early Shuffle阶段,使用多级延迟调度策略来提高Reduce任务的数据本地性。最后重写原生公平调度器代码实现了该调度算法,并与原生公平调度器进行了对比实验分析,实验结果表明该算法明显减少了作业执行时间,提高了集群的作业吞吐量。相似文献

5.

Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments

Najme MANSOURI 《Frontiers of Computer Science》2014,8(3):391-408

Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage. 相似文献

6.

Context‐aware scheduling in MapReduce: a compact review

Muhammad Idris Shujaat Hussain Maqbool Ali Arsen Abdulali Muhammad Hameed Siddiqi Byeong Ho Kang Sungyoung Lee 《Concurrency and Computation》2015,27(17):5332-5349

It is a fact that the attention of research community in computer science, business executives, and decision makers is drastically drawn by big data. As the volume of data becomes bigger, it needs performance‐oriented data‐intensive processing frameworks such as MapReduce, which can scale computation on large commodity clusters. Hadoop MapReduce processes data in Hadoop Distributed File System as jobs scheduled according to YARN fair scheduler and capacity scheduler. However, with advancement and dynamic changes in hardware and operating environments, the performance of clusters is greatly affected. Various efforts in literature have been made to address the issues of heterogeneity (i.e., clusters consisting of virtual machines and machines with different hardware), network communication, data locality, better resource utilization, and run‐time scheduling. In this paper, we present a survey to discuss various research efforts made so far to improve Hadoop MapReduce scheduling. We classify scheduling algorithms and techniques proposed in the literature so far based on their addressing areas and present a taxonomy. Furthermore, we also discuss various aspects of open issues and challenges in the scheduling of MapReduce to improve its performance. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

7.

Implementing and evaluating scheduling policies in gLite middleware

A. Kretsis P. Kokkinos E. A. Varvarigos 《Concurrency and Computation》2013,25(3):349-366

Grid scheduling algorithms are usually implemented in a simulation environment using tools that hide the complexity of the Grid and assumptions that are not always realistic. In our work, we describe the steps followed, the difficulties encountered and the solutions provided to develop and evaluate a scheduling policy, initially implemented in a simulation environment, in the gLite Grid middleware. Our focus is on a scheduling algorithm that allocates in a fair way the available resources among the requested users or jobs. During the actual implementation of this algorithm in gLite, we observed that the validity of the information used by the scheduler for its decisions affects greatly its performance. To improve the accuracy of this information, we developed an internal feedback mechanism that operates along with the scheduling algorithm. Also, a Grid computation resource cannot be shared concurrently between different users or jobs, making it difficult to provide actual fairness. For this reason we investigated the use of virtualization technology in the gLite middleware. We did a proof‐of‐concept implementation and performed an experimental evaluation of our scheduling algorithm in a small gLite testbed that proves the validity and applicability of our solutions. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

8.

Autonomic Clouds on the Grid 总被引：3，自引：0，他引：3

Michael A. Murphy Linton Abraham Michael Fenn Sebastien Goasguen 《Journal of Grid Computing》2010,8(1):1-18

Computational clouds constructed on top of existing Grid infrastructure have the capability to provide different entities with customized execution environments and private scheduling overlays. By designing these clouds to be autonomically self-provisioned and adaptable to changing user demands, user-transparent resource flexibility can be achieved without substantially affecting average job sojourn time. In addition, the overlay environment and physical Grid sites represent disjoint administrative and policy domains, permitting cloud systems to be deployed non-disruptively on an existing production Grid. Private overlay clouds administered by, and dedicated to the exclusive use of, individual Virtual Organizations are termed Virtual Organization Clusters. A prototype autonomic cloud adaptation mechanism for Virtual Organization Clusters demonstrates the feasibility of overlay scheduling in dynamically changing environments. Commodity Grid resources are autonomically leased in response to changing private scheduler loads, resulting in the creation of virtual private compute nodes. These nodes join a decentralized private overlay network system called IPOP (IP Over P2P), enabling the scheduling and execution of end user jobs in the private environment. Negligible overhead results from the addition of the overlay, although the use of virtualization technologies at the compute nodes adds modest service time overhead (under 10%) to computationally-bound Grid jobs. By leasing additional Grid resources, a substantial decrease (over 90%) in average job queuing time occurs, offsetting the service time overhead. 相似文献

9.

Dynamic scheduling of a batch of parallel task jobs on heterogeneous clusters

Jorge G. Barbosa Belmiro Moreira 《Parallel Computing》2011,37(8):428-438

This paper addresses the problem of minimizing the scheduling length (make-span) of a batch of jobs with different arrival times. A job is described by a direct acyclic graph (DAG) of parallel tasks. The paper proposes a dynamic scheduling method that adapts the schedule when new jobs are submitted and that may change the processors assigned to a job during its execution. The scheduling method is divided into a scheduling strategy and a scheduling algorithm. We also propose an adaptation of the Heterogeneous Earliest-Finish-Time (HEFT) algorithm, called here P-HEFT, to handle parallel tasks in heterogeneous clusters with good efficiency without compromising the makespan. The results of a comparison of this algorithm with another DAG scheduler using a simulation of several machine configurations and job types shows that P-HEFT gives a shorter makespan for a single DAG but scores worse for multiple DAGs. Finally, the results of the dynamic scheduling of a batch of jobs using the proposed scheduler method showed significant improvements for more heavily loaded machines when compared to the alternative resource reservation approach. 相似文献

10.

A multicriteria approach to two-level hierarchy scheduling in grids

Krzysztof Kurowski Jarek Nabrzyski Ariel Oleksiak Jan Węglarz 《Journal of Scheduling》2008,11(5):371-379

In this paper we address a multicriteria scheduling problem for computational Grid systems. We focus on the two-level hierarchical Grid scheduling problem, in which at the first level (the Grid level) a Grid broker makes scheduling decisions and allocates jobs to Grid nodes. Jobs are then sent to the Grid nodes, where local schedulers generate local schedules for each node accordingly. A general approach is presented taking into account preferences of all the stakeholders of Grid scheduling (end-users, Grid administrators, and local resource providers) and assuming a lack of knowledge about job time characteristics. A single-stakeholder, single-criterion version of the approach has been compared experimentally with the existing approaches. 相似文献

11.

Maximizing availability for task scheduling in computational grid using genetic algorithm

Shiv Prakash Deo Prakash Vidyarthi 《Concurrency and Computation》2015,27(1):193-210

Computational grid provides a wide distributed platform for high‐end compute intensive applications. Grid scheduling is often carried out to schedule the submitted jobs on the nodes of the grid so that some characteristic parameter is optimized. Availability of the computational nodes is one of the important characteristic parameters and measures the probability of the node availability for job execution. This paper addresses the availability of the grid computational nodes for the job execution and proposes a model to maximize it. As such, the task scheduling problem in grid is nondeterministic polynomial‐time hard, and often, metaheuristics techniques are applied to solve it. Genetic algorithm, a metaheuristic technique based on evolutionary computation, has been used to solve such complex optimization problem. This work proposes a technique for the grid scheduling problem using genetic algorithm with the objective to maximize availability. Simulation experiment, to evaluate the performance of the proposed algorithm, is conducted, and results reveal the effectiveness of the model. A comparative study has also been performed. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

12.

基于层次化调度策略和动态数据复制的网格调度方法 总被引：2，自引：0，他引：2

赖锦辉梁松《计算机应用研究》2014,31(2):412-416

针对在网格中如何有效地进行任务调度和数据复制, 以便减少任务执行时间等问题, 提出了任务调度算法（ISS）和优化动态数据复制算法（ODHRA）, 并构建一个方案将两种算法进行了有效结合。该方案采用ISS算法综合考虑任务等待队列的数量、任务需求数据的位置和站点的计算容量, 采用网络结构分级调度的方式, 配以适当的权重系数计算综合任务成本, 搜索出最佳计算节点区域; 采用ODHRA算法分析数据传输时间、存储访问延迟、等待在存储队列中的副本请求和节点间的距离, 在众多的副本中选取出最佳副本位置, 再结合副本放置和副本管理, 从而降低了文件访问时间。仿真结果表明, 提出的方案在平均任务执行时间方面, 与其他算法相比表现出了更好的性能。相似文献

13.

Enhanced Dynamic Hierarchical Replication and Weighted Scheduling Strategy in Data Grid

Najme Mansouri Gholam Hosein Dastghaibyfard 《Journal of Parallel and Distributed Computing》2013

The Data Grid provides massive aggregated computing resources and distributed storage space to deal with data-intensive applications. Due to the limitation of available resources in the grid as well as production of large volumes of data, efficient use of the Grid resources becomes an important challenge. Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner. Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available. In this paper two strategies are proposed, first a novel job scheduling strategy called Weighted Scheduling Strategy (WSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in a queue, the location of the required data for the job and the computing capacity of the sites Second, a dynamic data replication strategy, called Enhanced Dynamic Hierarchical Replication (EDHR) that improves file access time. This strategy is an enhanced version of the Dynamic Hierarchical Replication strategy. It uses an economic model for file deletion when there is not enough space for the replica. The economic model is based on the future value of a data file. Best replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time. So, it is considered in this paper. The proposed strategies are implemented by OptorSim, the European Data Grid simulator. Experiment results show that the proposed strategies achieve better performance by minimizing the data access time and avoiding unnecessary replication. 相似文献

14.

同构Hadoop环境作业执行时间计算方法

张霄宏海林鹏贾宗璞沈记全赵文涛《计算机工程与应用》2014,50(10):249-252

执行时间是作业调度的重要参考因素之一。通过分析Hadoop MapReduce环境作业的执行特征,提出了以map任务和reduce任务执行时间为输入,估算作业执行时间的方法。该方法在一定假设条件下,借助作业预执行来获取map任务和reduce任务的执行时间。实验结果表明,该方法估算作业执行时间的误差率小于7%。相似文献

15.

风电场数据中心Hadoop云平台作业调度算法研究

下载免费PDF全文

罗贤缙岳黎明甄成刚《计算机工程与应用》2015,51(15):266-270

风电场数据中心包含状态监测、数据采集等实时类作业和非实时类作业,采用C/S结构存在资源利用率不平衡、管理与维护成本高等缺点。设计了一种基于Hadoop云平台的数据中心架构;针对开源Hadoop平台现有FIFO调度器不能满足实时监测系统要求,在原有FIFO调度器的基础上,设计了一种双队列的作业调度器,综合考虑作业的截止时间和优先级来进行作业调度决策,实验结果表明,与FIFO调度器相比,双队列的作业调度器在集群负载较大时能够表现出较好的性能,保证实时类作业能够优先执行,为风电机组的安全运行提供保障。相似文献

16.

Analysis of Scheduling and Replica Optimisation Strategies for Data Grids Using OptorSim 总被引：1，自引：0，他引：1

D. G. Cameron A. P. Millar C. Nicholson R. Carvajal-Schiaffino K. Stockinger F. Zini 《Journal of Grid Computing》2004,2(1):57-69

Many current international scientific projects are based on large scale applications that are both computationally complex and require the management of large amounts of distributed data. Grid computing is fast emerging as the solution to the problems posed by these applications. To evaluate the impact of resource optimisation algorithms, simulation of the Grid environment can be used to achieve important performance results before any algorithms are deployed on the Grid. In this paper, we study the effects of various job scheduling and data replication strategies and compare them in a variety of Grid scenarios using several performance metrics. We use the Grid simulator , and base our simulations on a world-wide Grid testbed for data intensive high energy physics experiments. Our results show that scheduling algorithms which take into account both the file access cost of jobs and the workload of computing resources are the most effective at optimising computing and storage resources as well as improving the job throughput. The results also show that, in most cases, the economy-based replication strategies which we have developed improve the Grid performance under changing network loads. 相似文献

17.

混合存储模式下MapReduce作业调度

杨振宇牛天洋吕敏《计算机系统应用》2023,32(3):70-85

在异构Hadoop集群场景中, 为了缓和由于纠删码和副本存储模式混合使用, 以及服务器节点本身实时算力差异造成的MapReduce作业处理效率低下的问题, 本文实现了一种根据数据存储情况和节点实时负载来在多并发场景下动态调节MapReduce作业任务分配情况的调度策略. 该策略通过修改当前Hadoop框架中的数据存储选址策略并对节点任务并发量进行动态控制, 在多作业并发时实现更加均衡的作业间资源分配. 实验结果表明, 相较于Hadoop默认的两种作业调度策略, 本文提出的调度模式能够将作业完成时间缩短约17%, 并有效避免部分作业面临的饥饿现象. 相似文献

18.

一种MapReduce实时调度算法设计及实现

刘吉陈香兰代栋孙明明周学海《计算机系统应用》2013,22(8):113-119

MapReduce是云计算中重要的批数据处理框架,多任务共享MapReduce机群并满足任务实时性要求是调度算法急需解决的问题。提出两阶段实时调度算法,将调度划分为任务间调度和任务内调度。对于任务间调度,使用抽样法和经验值法确定子任务执行时间,利用该参数建立资源分配模型,动态确定任务优先级进行调度;对于子任务使用延迟调度策略进行调度,保证计算的本地性。实验结果显示,两阶段实时调度算法相比公平调度算法和FIFO算法,在保证吞吐量的同时能够满足任务实时性要求。相似文献

19.

Scheduling parallel jobs on multicore clusters using CPU oversubscription

Gladys Utrera Julita Corbalan Jesús Labarta 《The Journal of supercomputing》2014,68(3):1113-1140

相似文献

20.

An ant algorithm for balanced job scheduling in grids 总被引：1，自引：1，他引：0

Ruay-Shiung Jih-Sheng Po-Sheng 《Future Generation Computer Systems》2009,25(1):20-27

Grid computing utilizes the distributed heterogeneous resources in order to support complicated computing problems. Grid can be classified into two types: computing grid and data grid. Job scheduling in computing grid is a very important problem. To utilize grids efficiently, we need a good job scheduling algorithm to assign jobs to resources in grids.In the natural environment, the ants have a tremendous ability to team up to find an optimal path to food resources. An ant algorithm simulates the behavior of ants. In this paper, we propose a Balanced Ant Colony Optimization (BACO) algorithm for job scheduling in the Grid environment. The main contributions of our work are to balance the entire system load while trying to minimize the makespan of a given set of jobs. Compared with the other job scheduling algorithms, BACO can outperform them according to the experimental results. 相似文献