首页 | 官方网站   微博 | 高级检索  
     

一种基于MapReduce的最近似k对数据搜索方案
引用本文:刘淑英.一种基于MapReduce的最近似k对数据搜索方案[J].计算机与现代化,2014,0(8):38-45.
作者姓名:刘淑英
作者单位:咸阳师范学院信息工程学院,陕西咸阳712000
基金项目:咸阳师范学院基金资助项目(13XSYK054); 陕西省教学改革项目(13BY90)
摘    要:多种应用场合需要寻找给定数据库中相似度最大的前k对数据。然而由于应用领域需要处理的数据规模呈上升趋势,计算这样的最相似k对数据,难度非常大。提出一种基于MapReduce的最相似k对数据搜索方案,该方案首先将所有数据对分割成多个组,然后提出所有数据对分组算法和核心数据对分组算法,通过单独计算每个组中的最近似k对数据,从所有组的最近似k对数据中选择相似度最高的k对数据,进而确定最近似k对数据。最后基于合成数据和真实数据进行实验,性能评估结果表明本文MapReduce算法的有效性和可扩展性。

关 键 词:数据库    相似度    MapReduce    数据搜索    分组  
收稿时间:2014-08-19

A Top-k Closest Pairs Data Search Scheme Based on MapReduce
LIU Shu-ying.A Top-k Closest Pairs Data Search Scheme Based on MapReduce[J].Computer and Modernization,2014,0(8):38-45.
Authors:LIU Shu-ying
Affiliation:LIU Shu-ying (Institute of Information Engineering of Xianyang Normal University, Xianyang 712000, China)
Abstract:There is a wide range of applications that require finding the top-k most similar pairs of records in a given database.However,computing such top-k similarity joins is a challenging problem today,as there is an increasing trend of applications that expect to deal with vast amounts of data. This paper proposes a top-k closest pairs data search scheme based on MapReduce,firstly,the proposed scheme splits conceptually all pairs of points into partitions,and then the all pair partitioning and the essential pair partitioning methods are proposed,we can correctly find the top-k closest pairs by computing the top-k closest pairs in each partition separately and selecting the top-k closest pairs among the top-k closest pairs from all partitions. We finally perform the experiments with not only synthetic but also real-life data sets. The performance study confirms the effectiveness and scalability of the proposed MapReduce algorithms.
Keywords:database  similarity  MapReduce  data search  partitioning
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号