基于Hadoop MapReduce的大规模数据索引构建与集群性能分析 Large scale data index construction and cluster efficiency analysis based on Hadoop MapReduce期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Hadoop MapReduce的大规模数据索引构建与集群性能分析

引用本文：	谌超,强保华,石龙.基于Hadoop MapReduce的大规模数据索引构建与集群性能分析[J].桂林电子科技大学学报,2012(4):307-312.

作者姓名：	谌超强保华石龙

作者单位：	桂林电子科技大学计算机科学与工程学院

基金项目：	国家自然科学基金(61163057)

摘要：	为了满足搜索引擎构建索引的时空开销需求,构建高效的分布式索引,利用Hadoop搭建分布式集群环境,基于MapReduce编程实现大数据倒排索引.采用不同的网络带宽、数据量和集群节点数来评估Hadoop集群的性能.实验结果表明:网络带宽越大,集群处理效率越高;集群节点数越多,处理大数据的能力越强.可见,网络通信带宽对Ha...
关键词：	MapReduce 倒排索引 Hadoop集群
Large scale data index construction and cluster efficiency analysis based on Hadoop MapReduce

Chen Chao,Qiang Baohua,Shi Long.Large scale data index construction and cluster efficiency analysis based on Hadoop MapReduce[J].Journal of Guilin Institute of Electronic Technology,2012(4):307-312.

Authors:	Chen Chao Qiang Baohua Shi Long

Affiliation:	(School of Computer Science and Engineering,Guilin University of Electronic Technology,Guilin 541004,China)

Abstract:	In order to satisfy the search engine’s requirements of time and space and build effectively distributed index,Hadoop is used to build a distributed cluster environment,and large data inverted index can be achieved based on the MapReduce programming.The performance of the Hadoop cluster is evaluated by different network bandwidth,data volume and number of cluster nodes.Experimental results show that the greater network bandwidth is,the higher efficiency of cluster processing is;the more cluster nodes are,the stronger the ability to handle large data is.The performance of Hadoop cluster is influenced by the network communication bandwidth,high-speed cluster link can improve the performance of the cluster.

Keywords:	MapReduce inverted index Hadoop cluster
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏