首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 250 毫秒
1.
随着基于Hadoop平台的大数据技术的不断发展和实践的深入,Hadoop YARN资源调度策略在异构集群中的不适用性越发明显。一方面,节点资源无法动态分配,导致优势节点的计算资源浪费、系统性能没有充分发挥;另一方面,现有的静态资源分配策略未考虑作业在不同执行阶段的差异,易产生大量资源碎片。基于以上问题,提出了一种负载自适应调度策略。监控集群执行节点和提交作业的性能信息,利用实时监控数据建模、量化节点的综合计算能力,结合节点和作业的性能信息在调度器上启动基于相似度评估的动态资源调度方案。优化后的系统能够有效识别集群节点的执行能力差异,并根据作业任务的实时需求进行细粒度的动态资源调度,在完善YARN现有调度语义的同时,可作为子级资源调度方案架构在上层调度器下。在Hadoop 2.0上实现并测试该策略,实验结果表明,作业的自适应资源调度策略显著提高了资源利用率,集群并发度提高了2到3倍,时间性能提升了近10%。  相似文献   

2.
风电场数据中心包含状态监测、数据采集等实时类作业和非实时类作业,采用C/S结构存在资源利用率不平衡、管理与维护成本高等缺点。设计了一种基于Hadoop云平台的数据中心架构;针对开源Hadoop平台现有FIFO调度器不能满足实时监测系统要求,在原有FIFO调度器的基础上,设计了一种双队列的作业调度器,综合考虑作业的截止时间和优先级来进行作业调度决策,实验结果表明,与FIFO调度器相比,双队列的作业调度器在集群负载较大时能够表现出较好的性能,保证实时类作业能够优先执行,为风电机组的安全运行提供保障。  相似文献   

3.
为有效提高Hadoop集群作业调度的效率,提出一种基于蚁群算法的自适应作业调度的方案,有效利用蚁群算法正反馈的优势特点,使Hadoop作业调度器更高效地对任务进行分配,提高整体架构的作业性能。实验结果表明,该算法能够很好的平衡资源负载,减少任务的完成时间,提高系统处理任务的性能。  相似文献   

4.
通过研究蚁群算法,针对现有Hadoop调度器的不足,提出一个基于蚁群算法的Hadoop资源感知调度器及其具体实现方案。从而使Hadoop作业调度器可以更有效地对任务进行分配,提高整体架构的作业性能。通过实验证明,利用蚁群算法实现的资源感知调度器在同构环境中虽没有明显改善系统计算速度,但是在异构环境中可以很好提高系统处理任务的性能,降低了运算时间。  相似文献   

5.
针对目前 Hadoop 作业调度方法服务水平不高、资源利用率低的问题,提出了一种改进的 Hadoop 多用户作业调度算法。分析了 Hadoop 现行调度算法存在的不足,提出了基于服务质量(QoS)的作业选择量化和基于遗传算法的任务选择均衡化的方法,最后采用 Hadoop 平台对算法进行了仿真。仿真结果表明,该资源调度方法提高了作业的服务质量,实现了资源的合理调度。  相似文献   

6.
高燕飞  陈俊杰  强彦 《计算机科学》2015,42(9):45-49, 69
目前,云计算环境具有动态、异构和海量多类型任务并发等特征,随着集群规模不断增大、用户QoS不断增多,现有调度算法越来越难以适应动态变化的环境及满足用户的需求。针对Hadoop平台下现有调度器不能根据作业运行状态和资源使用情况进行动态调整的问题,提出了Hadoop下基于作业分类的动态调度算法。该算法在使用朴素贝叶斯分类算法对队列中作业进行分类的过程中,根据各个作业的类型,预先设定类别权值,将队列中的作业分类,并引入效用函数,根据用户提交时的预期完成时间QoS和作业完成情况估算其作业完成时间,实现动态设置作业优先级。实验表明,使用提出的算法不仅能有效减少 作业的分类时间,而且能明显提高 动态性和用户QoS。  相似文献   

7.
在大规模的Hadoop集群中,良好的任务调度策略对提高数据本地性、减小网络传输开销、减少作业执行时间以及提高集群的作业吞吐量都有着重要的影响。本文针对Hadoop架构中Reduce任务的数据本地性较低问题,提出了一种基于延迟调度策略的Reduce任务调度优化算法,通过提高Reduce任务的数据本地性来减少作业执行时间以及提高作业吞吐量,该算法在Hadoop架构的Early Shuffle阶段,使用多级延迟调度策略来提高Reduce任务的数据本地性。最后重写原生公平调度器代码实现了该调度算法,并与原生公平调度器进行了对比实验分析,实验结果表明该算法明显减少了作业执行时间,提高了集群的作业吞吐量。  相似文献   

8.
当今云计算环境下,Hadoop已经成为大数据处理的事实标准。然而云计算具有大规模、高复杂和动态性的特点,容易导致故障的发生,影响Hadoop上运行的作业。虽然Hadoop具有内置的故障检测和恢复机制,但云环境中不同节点负载大小的变化,被调度的作业仍然导致失败。针对此问题提出自响应故障感知的检测调度方法,对异构环境负载能力的不同,而做出服务器快节点和慢节点的判断,把作业分配调度到合适的节点上执行,调整任务决策来尽可能的防止任务失败的发生。最后在Hadoop框架下与基本调度器进行实验性能比较,结果显示该方法减少作业失败率最高达19%,并缩短了作业执行时间,同时也减少CPU和内存的使用。  相似文献   

9.
YARN是Hadoop的一个分布式的资源管理系统,用来提高分布式集群的内存、I/O、网络、磁盘等资源的利用率.然而,YARN的配置参数众多,要对其人工调优并获得最佳的性能费时费力.本文在现有的YARN资源调度器的基础上,结合了一种闭环反馈控制方法,可在集群运行状态下动态地对MapReduce (MR)作业数进行优化,省去了人工调整参数的过程.实验表明,在YARN的容量调度器和公平调度器的基础上使用该方法,相比于默认配置,MR作业完成时间分别减少53%和14%左右.  相似文献   

10.
马肖燕  洪爵 《集成技术》2012,1(3):66-71
目前Hadoop的作业调度算法都是将系统中的多类资源抽象成单一资源,分配给作业的资源均是节点资源中固定大小的一部分,称为插槽。这类基于插槽的算法没有考虑到系统多资源的差异性,忽略了不同类型作业对资源的不同需求,因此导致系统在吞吐量和平均作业完成时间上性能低下。本文研究了多资源环境下公平调度算法在Hadoop中的实现,设计了一种多资源公平调度器MFS(Multi-resource Fair Scheduler)。MFS采用了DRF(Dominant Resource Fairness)调度思想,使用需求向量来描述作业对各类资源的需求,并按照需求向量中各资源的大小给作业分配资源。MFS能更加充分有效地使用系统的各类资源,并能满足不同类型作业对资源的不同需求。实验表明相比于基于插槽的Fair Scheduler与Capacity Scheduler,MFS提高了系统的吞吐量,降低了平均作业完成时间。  相似文献   

11.
针对Hadoop和Spark等大数据分析系统中无先验知识任务的高效执行问题,设计了基于累计工作量(CRW)的任务调度器CRWScheduler。该调度器根据CRW将任务在低权重队列与高权重队列间切换;在为作业分配资源时,同时考虑到作业所在的队列和其瞬时占用资源量,无需作业先验知识即显著提升系统性能。基于Apache Hadoop YARN实现了CRWScheduler原型,在28个节点的基准测试集群上的实验表明,与YARN的公平调度机制相比,作业流时间(JFT)平均降低21%,其中95百分位的作业流时间(JFT)最多降低了35%,并且在与任务级调度程序协作时可获得进一步的性能提升。  相似文献   

12.
Hadoop has been developed as a solution for performing large-scale data-parallel applications in Cloud computing. A Hadoop system can be described based on three factors: cluster, workload, and user. Each factor is either heterogeneous or homogeneous, which reflects the heterogeneity level of the Hadoop system. This paper studies the effect of heterogeneity in each of these factors on the performance of Hadoop schedulers. Three schedulers which consider different levels of Hadoop heterogeneity are used for the analysis: FIFO, Fair sharing, and COSHH (Classification and Optimization based Scheduler for Heterogeneous Hadoop). Performance issues are introduced for Hadoop schedulers, and experiments are provided to evaluate these issues. The reported results suggest guidelines for selecting an appropriate scheduler for Hadoop systems. Finally, the proposed guidelines are evaluated in different Hadoop systems.  相似文献   

13.
In MapReduce model, a job is divided into a series of map tasks and reduce tasks. The execution time of the job is prolonged by some slow tasks seriously, especially in heterogeneous environments. To finish the slow tasks as soon as possible, current MapReduce schedulers launch a backup task on other nodes for each of the slow tasks. However, traditional MapReduce schedulers cannot detect slow tasks correctly since they cannot estimate the progress of tasks accurately (Hadoop home page http://hadoop.apache.org/, 2011; Zaharia et al. in 8th USENIX symposium on operating systems design and implementation, ACM, New York, pp. 29–42, 2008). To solve this problem, this paper proposes a History-based Auto-Tuning (HAT) MapReduce scheduler, which calculates the progress of tasks accurately and adapts to the continuously varying environment automatically. HAT tunes the weight of each phase of a map task and a reduce task according to the value of them in history tasks and uses the accurate weights of the phases to calculate the progress of current tasks. Based on the accurate-calculated progress of tasks, HAT estimates the remaining time of tasks accurately and further launches backup tasks for the tasks that have the longest remaining time. Experimental results show that HAT can significantly improve the performance of MapReduce applications up to 37% compared with Hadoop and up to 16% compared with LATE scheduler.  相似文献   

14.
Each job scheduler in large decentralized load balancing systems generally must consider whether it is advantageous to offload jobs to remote computation servers when the local load is too high. Although processing power may appear to be available at a very distant server, two problems arise due to the transmission delay between the scheduler and server. Predictably, the response time of the job is adversely affected as the job spends valuable time in transit, but a more subtle problem involves the value, or reliability, of the state information regarding job queues. The longer the delay between scheduler and server, the less a scheduler should value the state information of the server (given that the state changes over time). We examine the performance of schedulers in topologies with different average proximity and show a probabilistic algorithm that allows schedulers to dynamically form efficient clusters in the network.  相似文献   

15.
High performance computing (HPC) systems allow researchers and businesses to harness large amounts of computing power needed for solving complex problems. In such systems a job scheduler prioritizes the execution of jobs belonging to users of the system in a manner that allows the system to satisfy performance objectives for various groups of users while simultaneously making efficient use of available resources. Typically, system administrators have the responsibility of manually configuring or tuning the job scheduler such that the performance objectives of user groups as well as system‐level performance objectives are met. Modern job schedulers used in production systems are quite complex. Through detailed trace‐driven simulations, we show that manually tuning the configuration of production schedulers in an environment characterized by multiple performance objectives is very challenging and may not be feasible. To alleviate this problem, this paper describes a toolset that can help a system administrator to automatically configure a scheduler such that the performance objectives for various classes of users in the system as well as other system‐level performance objectives can be satisfied. A unique aspect of this work that differentiates it from the existing work on scheduler tuning is that it has been implemented to work with a widely used production scheduler. Furthermore, in contrast to the existing work it considers the challenging real‐world problem of delivering different levels of performance to different classes of users. System administrators can exploit the toolset to react quickly to changes in performance objectives and workload conditions. Case studies using synthetic and real HPC workloads demonstrate the effectiveness of the technique. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

16.
Fairness is an important aspect in queuing systems. Several fairness measures have been proposed in queuing systems in general and parallel job scheduling in particular. Generally, a scheduler is considered unfair if some jobs are discriminated whereas others are favored. Some of the metrics used to measure fairness for parallel job schedulers can imply unfairness where there is no discrimination (and vice versa). This makes them inappropriate. In this paper, we show how the existing approach misrepresents fairness in practice. We then propose a new approach for measuring fairness for parallel job schedulers. Our approach is based on two principles: (i) as jobs have different resource requirements and find different queue/system states, they need not have the same performance for the scheduler to be fair and (ii) to compare two schedulers for fairness, we make comparisons of how the schedulers favor/discriminate individual jobs. We use performance and discrimination trends to validate our approach. We observe that our approach can deduce discrimination more accurately. This is true even in cases where the most discriminated jobs are not the worst performing jobs. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号