首页 | 官方网站   微博 | 高级检索  
     

面向大数据任务的调度方法
引用本文:李孜颖,石振国.面向大数据任务的调度方法[J].计算机应用,2005,40(10):2923-2928.
作者姓名:李孜颖  石振国
作者单位:南通大学 信息科学技术学院, 江苏 南通 226001
基金项目:江苏省自然科学基金资助项目(18KJB520041);南通市科技项目(JC2018132);南京航空航天大学高安全系统的软件开发与验证技术工业和信息化部重点实验室开放基金资助项目(NJ2018014)。
摘    要:针对在大数据的处理过程中,对大数据任务的划分和资源分配缺乏合理性的问题,提出一种面向大数据任务的调度方法。该方法首先引入了调度理论用于处理大数据任务,帮助建立合理的大数据任务管理体系并规范大数据任务处理流程;然后,基于大数据任务的本质对数据集进行分析处理,引入决策表进行属性约简,以减小大数据分析任务的数据量和提高大数据分析效率;最后,采用模糊综合评价方法,将模糊综合评价的结果作为对任务调度的依据,以提高任务资源分配合理性。在UCI(University of California Irvine)数据集上进行测试,实验结果表明,该调度算法在平均预测准确度上比朴素贝叶斯(NB)算法高7.42个百分点,比误差反向传播(BP)算法高5.16个百分点,比均方根传递(RMSProp)算法高3.74个百分点。而对于特征数较多的数据集,所提算法在预测精度上较其他算法有显著提高。所提算法在平均调度长度比(SLR)上较HCPFS(Heterogeneous Critcal Path First Synthesis)算法和HIPLTS(Heterogeneous Improved Priority List for Task Scheduling)算法分别下降了12.14%和4.56%,在平均加速比上分别提升了7.14%和42.56%,表明该算法能有效提高大数据系统中任务调度的效率。综合比较分析,所提方法具有较高的预测精度,且高效可靠。

关 键 词:大数据    任务调度    决策表    属性约简    模糊综合评价
收稿时间:2020-03-24
修稿时间:2020-05-08

Scheduling method for big data tasks
LI Ziying,SHI Zhenguo.Scheduling method for big data tasks[J].journal of Computer Applications,2005,40(10):2923-2928.
Authors:LI Ziying  SHI Zhenguo
Affiliation:School of Information Science and Technology, Nantong University, Nantong Jiangsu 226001, China
Abstract:Because the division and resource allocation of big data tasks lacks rationality in big data processing procedure, a scheduling method for big data tasks was proposed. First, in order to establish a reasonable management system of big data tasks and standardize the big data task processing flow, the scheduling theory was introduced to handle big data tasks. Then, based on the natures of big data tasks, the datasets were analyzed and handled, the decision table was introduced to perform attribute reduction, so as to reduce the data amount of big data analysis tasks and improve the big data analysis efficiency. Finally, the fuzzy comprehensive evaluation method was adopted, and the result of fuzzy comprehensive evaluation was used as the basis for task scheduling, thereby improving the rationality of task resource allocation. Experimental results on University of California Irvine (UCI) datasets show that the average prediction accuracy of the proposed scheduling algorithm is 7.42 percentage points higher than that of the Naive Bayes (NB) algorithm, 5.16 percentage points higher than that of the error Back Propagation (BP) algorithm, and 3.74 percentage points higher than that of the Root Mean Square Prop (RMSProp) algorithm. For datasets with a large number of features, the prediction accuracy of the proposed algorithm is significantly improved compared to those of other algorithms. Compared with Heterogeneous Critcal Path First Synthesis (HCPFS) algorithm and Heterogeneous Improved Priority List for Task Scheduling (HIPLTS) algorithm, the proposed algorithm has the average Scheduling Length Ratio (SLR) decreased by 12.14% and 4.56% respectively, and the average speedup ratio increased by 7.14% and 42.56% respectively, showing that the algorithm can effectively improve the efficiency of task scheduling in big data systems. Comprehensive analysis shows that the proposed algorithm performs well in prediction accuraing, and is efficient and reliable.
Keywords:big data                                                                                                                        task scheduling                                                                                                                        decision table                                                                                                                        attribute reduction                                                                                                                        fuzzy comprehensive evaluation
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号