首页 | 官方网站   微博 | 高级检索  
     

基于Hadoop MapReduce的大规模数据索引构建与集群性能分析
引用本文:谌超,强保华,石龙.基于Hadoop MapReduce的大规模数据索引构建与集群性能分析[J].桂林电子科技大学学报,2012(4):307-312.
作者姓名:谌超  强保华  石龙
作者单位:桂林电子科技大学计算机科学与工程学院
基金项目:国家自然科学基金(61163057)
摘    要:为了满足搜索引擎构建索引的时空开销需求,构建高效的分布式索引,利用Hadoop搭建分布式集群环境,基于MapReduce编程实现大数据倒排索引.采用不同的网络带宽、数据量和集群节点数来评估Hadoop集群的性能.实验结果表明:网络带宽越大,集群处理效率越高;集群节点数越多,处理大数据的能力越强.可见,网络通信带宽对Ha...

关 键 词:MapReduce  倒排索引  Hadoop集群

Large scale data index construction and cluster efficiency analysis based on Hadoop MapReduce
Chen Chao,Qiang Baohua,Shi Long.Large scale data index construction and cluster efficiency analysis based on Hadoop MapReduce[J].Journal of Guilin Institute of Electronic Technology,2012(4):307-312.
Authors:Chen Chao  Qiang Baohua  Shi Long
Affiliation:(School of Computer Science and Engineering,Guilin University of Electronic Technology,Guilin 541004,China)
Abstract:In order to satisfy the search engine’s requirements of time and space and build effectively distributed index,Hadoop is used to build a distributed cluster environment,and large data inverted index can be achieved based on the MapReduce programming.The performance of the Hadoop cluster is evaluated by different network bandwidth,data volume and number of cluster nodes.Experimental results show that the greater network bandwidth is,the higher efficiency of cluster processing is;the more cluster nodes are,the stronger the ability to handle large data is.The performance of Hadoop cluster is influenced by the network communication bandwidth,high-speed cluster link can improve the performance of the cluster.
Keywords:MapReduce  inverted index  Hadoop cluster
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号