首页 | 官方网站   微博 | 高级检索  
     

基于MapReduce的Skyline查询处理算法
引用本文:崔文相,肖迎元,郝刚,王洪亚,邓华锋.基于MapReduce的Skyline查询处理算法[J].计算机科学,2016,43(6):35-38, 64.
作者姓名:崔文相  肖迎元  郝刚  王洪亚  邓华锋
作者单位:天津理工大学计算机与通信工程学院 天津300384;天津市智能计算及软件新技术重点实验室 天津300384,天津理工大学计算机与通信工程学院 天津300384;天津市智能计算及软件新技术重点实验室 天津300384,天津理工大学计算机与通信工程学院 天津300384;天津市智能计算及软件新技术重点实验室 天津300384,东华大学计算机科学与技术学院 上海201620,江西师范大学计算机信息工程学院 南昌330022
基金项目:本文受国家自然科学基金(61170174,5),天津市创新团队计划(TD12-5016)资助
摘    要:Skyline查询是一个典型的多目标优化查询,在多目标优化、数据挖掘等领域有着广泛的应用。现有的Skyline查询处理算法大都假定数据集存放在单一数据库服务器中,查询处理算法通常也被设计成针对单一服务器的串行算法。随着数据量的急剧增长,特别是在大数据背景下,传统的基于单机的串行Skyline算法已经远远不能满足用户的需求。基于流行的分布式并行编程框架MapReduce,研究了适用于大数据集的并行Skyline查询算法。针对影响MapReduce计算的因素,对现有基于角度的划分策略进行了改进,提出了Balanced Angular划分策略;同时,为了减少Reduce过程的计算量,提出了在Map端预先进行数据过滤的策略。实验结果显示所提出的Skyline查询算法能显著提升系统性能。

关 键 词:MapReduce  Skyline  数据划分
收稿时间:2015/6/25 0:00:00
修稿时间:9/2/2015 12:00:00 AM

MapReduce-based Skyline Query Processing Algorithm
CUI Wen-xiang,XIAO Ying-yuan,HAO Gang,WANG Hong-ya and DENG Hua-feng.MapReduce-based Skyline Query Processing Algorithm[J].Computer Science,2016,43(6):35-38, 64.
Authors:CUI Wen-xiang  XIAO Ying-yuan  HAO Gang  WANG Hong-ya and DENG Hua-feng
Affiliation:School of Computer and Communication Engineering,Tianjin University of Technology,Tianjin 300384,China;Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology,Tianjin 300384,China,School of Computer and Communication Engineering,Tianjin University of Technology,Tianjin 300384,China;Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology,Tianjin 300384,China,School of Computer and Communication Engineering,Tianjin University of Technology,Tianjin 300384,China;Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology,Tianjin 300384,China,Department of Computer Science and Engineering,Donghua University,Shanghai 201620,China and School of Computer and Information Engineering,Jiangxi Normal University,Nanchang 330022,China
Abstract:Skyline query is a typical multi-objective optimization problem and is widely applied in multi-objective optimization,data mining and other fields.Most of the existing Skyline query processing algorithms assume that the data set is placed in a single server,and query processing algorithm is designed as a serial algorithm for a single server.With the rapid growth of data,especially under the background of big data,the traditional serial Skyline algorithms based on a single computer are far from enough to meet the needs of users.Based on the popular distributed parallel programming framework MapReduce,this paper studied the parallel skyline query algorithm suitable for large data sets.Aiming at the factors affecting MapReduce,this paper improved the existing data partition strategy based on angles and proposed the data partition strategy based on Balanced Angular.Meanwhile,to reduce the computation of Reduce phase,this paper proposed the data filtering strategy in advance at Map.The experimental results show that the proposed Skyline query algorithm can improve system performance significantly.
Keywords:MapReduce  Skyline  Data partition
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号