首页 | 官方网站   微博 | 高级检索  
     

基于SQL-on-Hadoop的网络日志分析
引用本文:章思宇,姜开达,韦建文,罗 萱,王海洋.基于SQL-on-Hadoop的网络日志分析[J].通信学报,2014,35(Z1):4-19.
作者姓名:章思宇  姜开达  韦建文  罗 萱  王海洋
作者单位:1. 上海交通大学 网络信息中心,上海 200240;2. 上海交通大学 电子信息与电气工程学院,上海 200240
基金项目:国家自然科学基金资助项目(61371084)
摘    要:当今网络带宽、设备和应用数量急剧扩张,日志管理面临数据量爆炸式增长的挑战。基于SQL-on-Hadoop构建网络日志分析平台,实现千亿级日志存储和高效、灵活查询。利用真实TB级数据集对多种Hadoop列存储格式及压缩算法进行性能测试,并对比Hive和Impala引擎日志扫描及统计查询效率,选用Gzip压缩的Parquet格式可将日志体积压缩80%,且将Impala查询性能提升至5倍。基于该平台已开发6种安全事件响应、攻击检测和预警应用并发挥良好效果。

关 键 词:日志分析  大数据  Hadoop  SQL  网络安全

Network log analysis with SQL-on-Hadoop
Si-yu ZHANG,Kai-da JIANG,Jian-wen WEI,Xuan LUO,Hai-yang WANG.Network log analysis with SQL-on-Hadoop[J].Journal on Communications,2014,35(Z1):4-19.
Authors:Si-yu ZHANG  Kai-da JIANG  Jian-wen WEI  Xuan LUO  Hai-yang WANG
Affiliation:1. Network and Information Center,Shanghai Jiaotong University,Shanghai 200240,China;2. School of Electronic Information and Electrical Engineering,Shanghai Jiaotong University,Shanghai 200240,China
Abstract:With the rapid expansion of network bandwidth, devices and applications, log management is facing the challenge of exploding data volumes. Log analysis platform built on SQL-on-Hadoop is capable of storing and querying hundreds of billions of log entries effectively. Columnar and compressed data formats for Hadoop are benchmarked with real-world multi-TB dataset. Conditional and statistical querying efficiency of Hive and Impala is tested. With gzipped parquet format, log data can be compressed by 80%, and querying with impala is 5 times faster. On this platform, six security incident analysis and detection applications are already deployed.
Keywords:log analysis  big data  Hadoop  SQL  network security
点击此处可从《通信学报》浏览原始摘要信息
点击此处可从《通信学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号