首页 | 官方网站   微博 | 高级检索  
     


Accelerating big data analytics on HPC clusters using two-level storage
Affiliation:1. School of Computing, Clemson University, Clemson, SC, 29634, United States;2. Electrical and Computer Engineering, Clemson University, Clemson, SC, 29634, United States;1. Department of Computer Architecture, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Narutowicza 11/12, Gdańsk 80-233, Poland;2. Academic Computer Centre, Narutowicza 11/12, Gdańsk 80-233, Poland;1. STMicroelectronics, Rousset, France;2. Aix-Marseille University, CNRS, IM2NP UMR 7334, Marseille, France;3. University of Côte d''Azur, Polytech''Lab UPR UCA 7498, Sophia-Antipolis, France;4. STMicroelectronics, Crolles, France
Abstract:Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data-intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two-level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号