基于Hadoop的Web日志挖掘 Weblog Mining Based on Hadoop期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Hadoop的Web日志挖掘

引用本文：	程苗,陈华平.基于Hadoop的Web日志挖掘[J].计算机工程,2011,37(11):37-39.

作者姓名：	程苗陈华平

作者单位：	1. 中国科学技术大学管理学院,合肥,230026 2. 中国科学技术大学计算机科学与技术学院,合肥,230026

基金项目：	博士点基金，创新研究群体科学基金

摘要：	基于单一节点的数据挖掘系统在挖掘Web海量数据源时存在计算瓶颈，针对该问题，利用云计算的分布式处理和虚拟化技术的优势，设计一种基于云计算的Hadoop集群框架的Web日志分析平台，提出一种能够在云计算环境中进行分布式处理的混合算法。为进一步验证该平台的高效性，在该平台上利用改进后的算法挖掘Web日志中用户的偏爱访问路径。实验结果表明，在集群中运用分布式算法处理大量的Web日志文件，可以明显提高Web数据挖掘的效率。
关键词：	云计算 Hadoop架构 Map/Reduce编程模式 Web日志挖掘遗传算法偏爱访问路径
收稿时间：	2011-04-20
Weblog Mining Based on Hadoop

CHENG Miao,CHEN Hua-ping.Weblog Mining Based on Hadoop[J].Computer Engineering,2011,37(11):37-39.

Authors:	CHENG Miao CHEN Hua-ping

Affiliation:	b(a.College of Management;b.College of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China)

Abstract:	The mass data from Web are distributed,heterogeneous and dynamic,so the current data mining system based on single node has developed to a bottleneck.Using the advantage of cloud computing——distributed processing and virtualization,this paper presents a Weblog analysis platform under the Hadoop＇s cluster framework based on cloud computing,it also presents a hybrid algorithm which can distributed process in the cloud computing environment.To further verify the effectiveness and efficiency of the platform,it uses the improved algorithm to mine users＇ preferred access path in Weblog on the platform.Experimental results show that,using distributed algorithm to process large number of Weblog files in the cluster,can significantly improve the efficiency of Web data mining.

Keywords:	cloud computing Hadoop frame Map/Reduce Weblog mining genetic algorithm preferred browsing path
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏