首页 | 官方网站   微博 | 高级检索  
     

云环境下影响数据分布并行应用执行效率的因素分析
引用本文:马生俊,陈旺虎,俞茂义,李金溶,郏文博.云环境下影响数据分布并行应用执行效率的因素分析[J].计算机应用,2017,37(7):1883-1887.
作者姓名:马生俊  陈旺虎  俞茂义  李金溶  郏文博
作者单位:西北师范大学 计算机科学与工程学院, 兰州 730070
基金项目:国家自然科学基金资助项目(61462076)。
摘    要:云环境下,类似MapReduce的数据分布并行应用被广泛运用。针对此类应用执行效率低、成本高的问题,以Hadoop为例,首先,分析该类应用的执行方式,发现数据量、节点数和任务数是影响其效率的主要因素;其次,探讨以上因素对应用效率的影响;最后,通过实验得出在数据量一定的情况下,增加节点数不会明显提高应用的执行效率,反而极大地增加执行成本;当任务数接近节点数时,应用的执行效率较高、成本较低。该结论为云环境中类似MapReduce的数据分布并行应用的效率优化提供借鉴,并为用户租用云资源提供参考。

关 键 词:云环境    数据分布并行应用    MapReduce    效率    成本
收稿时间:2017-01-16
修稿时间:2017-03-11

Analysis of factors affecting efficiency of data distributed parallel application in cloud environment
MA Shengjun,CHEN Wanghu,YU Maoyi,LI Jinrong,JIA Wenbo.Analysis of factors affecting efficiency of data distributed parallel application in cloud environment[J].journal of Computer Applications,2017,37(7):1883-1887.
Authors:MA Shengjun  CHEN Wanghu  YU Maoyi  LI Jinrong  JIA Wenbo
Affiliation:College of Computer Science and Engineering, Northwest Normal University, Lanzhou Gansu 730070, China
Abstract:Data distributed parallel applications like MapReduce are widely used. Focusing on the issues such as low execution efficiency and high cost of such applications, a case analysis of Hadoop was given. Firstly, based on the analyses of the execution processes of such applications, it was found that the data volume, the numbers of the nodes and tasks were the main factors that affected their execution efficiency. Secondly, the impacts of the factors mentioned above on the execution efficiency of an application were explored. Finally, based on a set of experiments, two important novel rules were derived as follows. Given a specific volume of data, the execution efficiency of a data distributed parallel application could not be improved remarkably only by increasing the number of nodes, but the execution cost would raise on the contrary. However, when the number of tasks was nearly equal to that of the nodes, a higher efficiency and lower cost could be got for such an application. The conclusions are useful for users to optimize their data distributed parallel applications and to estimate the necessary computing resources to be rented in a cloud environment.
Keywords:cloud environment                                                                                                                        data distributed parallel application                                                                                                                        MapReduce                                                                                                                        efficiency                                                                                                                        cost
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号